|
Bugzilla – Full Text Bug Listing |
| Summary: | kernel freeze: ondemand governor freezes AMD Opteron machine | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE LINUX 10.0 | Reporter: | Hannes Reinecke <hare> |
| Component: | Kernel | Assignee: | Olaf Kirch <okir> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P5 - None | CC: | aj, mark.langsdorf, ro, trenn, venkatesh.pallipadi |
| Version: | Beta 1 | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | All | ||
| Whiteboard: | |||
| Found By: | Development | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
messages.txt
Update CPU freq policy (allowed min/max/...) on every CPU load check Retry to Pstate change request until they are successful |
||
|
Description
Hannes Reinecke
2005-08-10 11:35:03 UTC
Created attachment 45499 [details]
messages.txt
/var/log/messages
Dubious thing is this:
events/1 D ffffffff805d2280 0 11 1 12 10 (L-TLB)
ffff8100bfd03d28 0000000000000046 0000000000000282 ffffffff8036e217
ffff8100bfd03cb8 ffffffff805d2800 ffffffff805d2800 ffffffff805d2800
0000000000000080 000000000000015f
Call Trace:
<ffffffff8036e217>{thread_return+145}
<ffffffff801368b6>{find_busiest_group+358}
<ffffffff8036d137>{__down+247}
<ffffffff801372c0>{default_wake_function+0}
<ffffffff8036e1d8>{thread_return+82}
<ffffffff8036ed40>{__down_failed+53}
<ffffffff802e599b>{.text.lock.cpufreq+5}
<ffffffff882ca78e>{:cpufreq_ondemand:do_dbs_timer+510}
<ffffffff882ca590>{:cpufreq_ondemand:do_dbs_timer+0}
<ffffffff8014e08e>{worker_thread+478}
<ffffffff801372c0>{default_wake_function+0}
<ffffffff801338a3>{__wake_up_common+67}
<ffffffff8014deb0>{worker_thread+0}
<ffffffff80152f73>{kthread+243}
<ffffffff80137e30>{schedule_tail+64}
<ffffffff8010fa52>{child_rip+8}
<ffffffff80152e80>{kthread+0}
<ffffffff8010fa4a>{child_rip+0}
Doesn't look too good ...
Agreed. Venkatesh, do you have any ideas? (I assume it's really blocking in ondemand and not stack garbage) I've disabled powersave now and updated to current kotd. Haven't seen something like this before. What platform is this hang being seen on? Can I get the pointer to failing kotd so that I can test it on some local system here? 2 dualcore Opteron CPUs. This was with the 10.0 beta1 kernel. Oh I thought it was an Intel system because of the DBS in "do_dbs_timer" But it seems to be used in the generic ondemand code. My mistake. Sorry Venkatesh, someone else's problem then. The machine was running overnight without problems. This morning I started
powersave again and 2 hours later the machine had again no keyboard:
sysreq-t shows again:
ug 11 08:45:03 reger kernel: events/1 D ffff8100bfcc4ed0 0 11 1
12 10 (L-TLB)
Aug 11 08:45:03 reger kernel: ffff8100022a9d28 0000000000000046 0000000000000282
ffffffff8036e137
Aug 11 08:45:03 reger kernel: ffff8100022a9cb8 ffffffff805d2800
ffffffff805d2800 ffffffff805d2800
Aug 11 08:45:03 reger kernel: 0000000000000001 000000000002c0cc
Aug 11 08:45:03 reger kernel: Call Trace:<ffffffff8036e137>{thread_return+145}
<ffffffff801368f6>{find_busiest_group+358}
Aug 11 08:45:03 reger kernel: <ffffffff8036d057>{__down+247}
<ffffffff80137300>{default_wake_function+0}
Aug 11 08:45:03 reger kernel: <ffffffff8036e0f8>{thread_return+82}
<ffffffff8036ec60>{__down_failed+53}
Aug 11 08:45:03 reger kernel: <ffffffff802e586b>{.text.lock.cpufreq+5}
<ffffffff883aa78e>{:cpufreq_ondemand:do_dbs_timer+510}
Aug 11 08:45:03 reger kernel:
<ffffffff883aa590>{:cpufreq_ondemand:do_dbs_timer+0}
Aug 11 08:45:03 reger kernel: <ffffffff8014e0ce>{worker_thread+478}
<ffffffff80137300>{default_wake_function+0}
Aug 11 08:45:03 reger kernel: <ffffffff801338e3>{__wake_up_common+67}
<ffffffff8014def0>{worker_thread+0}
Aug 11 08:45:03 reger kernel: <ffffffff80152fb3>{kthread+243}
<ffffffff80137e70>{schedule_tail+64}
Aug 11 08:45:03 reger kernel: <ffffffff8010fa52>{child_rip+8}
<ffffffff80152ec0>{kthread+0}
Aug 11 08:45:03 reger kernel: <ffffffff8010fa4a>{child_rip+0}
Anything else I can test?
I'm running now with the userspace governor instead of the ondemand governor. Mark have you ever tested powernow-k8 with the kernel ondemand governour? No, ondemand hasn't really been usable until the last six months. I'll do some tests with the latest kotd and see if I can figure it out. Not usable in what way? If there are too many problems we can go back to the user space code in 10.0 I don't remember the details - it was a bit unstable when I looked at a year ago. I think it's stabilized since and it probably needs more exposure. I'll do some testing tomorrow, I hope. Let's change the subject.. Mark, any results from your testing? Just an idea: While searching why ondemand governor does not recognise thermal cooling and does not lower freq accordingly, I found out that it always uses min/max freq exported at startup. Maybe the cpufreq core rereads new cpufreq settings due an ACPI processor event, ondemand governor still uses old min/max and tries to set a not allowed frequency? I will attach a patch that ondemand updates the cpufreq steps on every CPU load check. This probably won't go mainline, but should be done through a notifier chain so that changes will only be checked if they really happen. If those problems are related you could already find out with this one. Mark: Do you think this makes sense and could be the culprit? Created attachment 46867 [details]
Update CPU freq policy (allowed min/max/...) on every CPU load check
I think we should just disable ondemand on AMD systems as discussed on kernel@ Upgrading to critical to put it onto the Radar. Thomas, can you add such code to the powersaved init scripts? Created attachment 56705 [details]
Retry to Pstate change request until they are successful
Submitted on cpufreq list by Mark some days ago.
Mark does this fix this bug or have you heard any bad of it? Reassigning to okir for inclusion (only attachement from comment #19). Hare: please reopen if you still have the problem. The retry to pstate change request patch will probably not affect this bug. I'll see if I can do some testing today on the ondemand governor. Ah, alright. Sorry, I missed your request for inclusion in comment #20. Will do! Ohh, be careful, the patch from Mark has a little typo: retrun <-> return. Then it should compile and hopefully work as expected ... Patch is in CVS for 10.0 now. Thanks! |