Bug 102565 - acpi hangs the system for 2 - 3 seconds every 30 seconds
Summary: acpi hangs the system for 2 - 3 seconds every 30 seconds
Status: RESOLVED FIXED
: 102954 103000 105290 (view as bug list)
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Preview 4
Hardware: Other All
: P3 - Medium : Critical
Target Milestone: ---
Assignee: Andreas Kleen
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-08 12:00 UTC by Marcus Meissner
Modified: 2007-06-06 20:09 UTC (History)
10 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
xx (1.31 KB, text/plain)
2005-08-08 12:12 UTC, Marcus Meissner
Details
Don't do a dumb linear search in memory debugging (407 bytes, patch)
2005-08-10 10:52 UTC, Andreas Kleen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus Meissner 2005-08-08 12:00:06 UTC
On a AMD64 single processor system (westernhagen) (one of the first ones we 
got), kacpid has now 1040 minutes runtime. 
 
The system "hangs" for 2 - 3 seconds every 30 seconds, but I do not know 
why. 
 
/var/log/messages contains lots of those: 
 
[ACPI Debug]  String: [0x0F] "Existing RTMP()" 
[ACPI Debug]  String: [0x0F] "Entering RTMP()" 
[ACPI Debug]  String: [0x0F] "Entering TIN2()" 
[ACPI Debug]  String: [0x0F] "Existing RTMP()" 
 
which seem to appear around the hangs.
Comment 1 Andreas Kleen 2005-08-08 12:04:29 UTC
First I would upgrade the BIOS.  
 
Can you use oprofile to find out which functions use a lot of CPU time? 
 
  
Comment 2 Marcus Meissner 2005-08-08 12:12:58 UTC
Created attachment 45119 [details]
xx

opreport -l output, toplines...
Comment 3 Marcus Meissner 2005-08-08 12:13:36 UTC
acpi_ut_track_allocation is the problem i guess. 
 
 
Comment 4 Andreas Kleen 2005-08-08 12:26:48 UTC
Ok this means we need to disable CONFIG_ACPI_DEBUG again. Thomas what do 
you think? 
 
Comment 5 Thomas Renninger 2005-08-08 13:14:14 UTC
What a pity.
There is a lot traffic on ACPI list/code currently.
I expect a lot of bugs the next days and without ACPI debug it's really hard to
see anything.

Could we wait some more days, switching back to ACPI_DEBUG_LITE shouldn't be a
bigger problem in e.g. Beta3?

Marcus: could you also test the newest kernel? There were ACPI bugfixes in every
RC-X the last days.
Can you also give it a try with ec_polling boot param?
Comment 6 Andreas Kleen 2005-08-08 13:40:29 UTC
Ok agreed. Should keep it on for some longer time.
Comment 7 Marcus Meissner 2005-08-08 15:52:19 UTC
i tried the beta1 kernel ...  
  
the hangs are gone now.  
 
i think this is fixed. 
Comment 8 Marcus Meissner 2005-08-09 11:54:09 UTC
actually after 1 day of use it is starting to happen again...  
apparently this is exposed after some time of use? 
Comment 9 Thomas Renninger 2005-08-09 15:33:05 UTC
*** Bug 103000 has been marked as a duplicate of this bug. ***
Comment 10 Thomas Renninger 2005-08-09 15:48:30 UTC
This is because of ACPI_DEBUG enabled and EC burst mode not enabled by default
any more.
For a fast workaround please try:
ec_burst=burst

The patch has been submitted around -rc3 and enabled by default.
It has been thrown out at -rc5 and it has been modified and readded in -rc6 (not
enabled by default).

This patch is know to fix some machines (e.g. pressing ACPI power button will
result in nothing, then pressing sleep/lid button will result in power button ->
#61106 and some other strange issues) but it is also know to break some
machines, e.g. ASUS L5D (too lazy to search bugzilla.kernel.org bug).

This is really difficult, we could go for:
       - ACPI_DEBUG (which I think is convenient, especially for the SL product)
       - ec_burst=burst by default
       - try to blacklist not working burst mode machines
or
       - !ACPI_DEBUG
       - ec_burst=polling by default
       - try to blacklist not working polling mode machines


I'd say we go for burst mode per default and blacklist not working machines.
I'd like to see ACPI_DEBUG=y in our final SL product (Andi?).
Comment 11 Timo Hoenig 2005-08-09 16:15:31 UTC
In either way, you're propsing a blacklist.  How can e.g. the L5D be identified? Which other machines 
suffer from the problem.

Since you can not guarantee a sane decision for all systems with a blacklist I'd prefer
   * ACPI_DEBUG on
   * ec_burst=burst
   * No blacklist

Since mainline is aware of the problem, we'll have a fixed kernel sometime (and provide an update) 
for the few machines we'll break by setting ec_burst=burst for now.
Comment 12 Timo Hoenig 2005-08-09 16:16:57 UTC
removing mobile@suse.de from CC, we're aware of the bug and will follow-up using bugzilla
Comment 13 Thomas Renninger 2005-08-09 16:22:55 UTC
*** Bug 102954 has been marked as a duplicate of this bug. ***
Comment 14 Stefan Behlert 2005-08-10 07:29:32 UTC
I'm not very happy about ACPI_DEBUG=on but if no one from the kernel-list  
objects and aj agrees I think I can live with it.  
I am raising severity, since activating ACPI_DEBUG is intrusive, and 'normal' 
no longer justified for it. 
Comment 15 Andreas Kleen 2005-08-10 07:39:10 UTC
I suspect only some parts of all the stuff enabled by ACPI_DEBUG cause 
problems. e.g. in Marcus Solo it seems to be that the memory tracking 
is slow. It would be probably best to just disable that part. We mainly 
want it just for the tracking capabilities. 
 
Comment 16 Andreas Jaeger 2005-08-10 08:02:36 UTC
Is there an option to disable it where it causes problems?
Comment 17 Thomas Renninger 2005-08-10 08:22:16 UTC
Sorry, I misinterpreted the code, the option should be ec_burst=1, ec_burst=0 to
enable disable burst mode (if explicitly set by boot param you should see "EC
burst mode." in syslog - KERN_INFO syslog level).
I just sent olh a patch to enable burst mode by default.

#16:
I can have a look, but ACPI_DEBUG is an ugly bunch of code.
I'll contact Marcus for testing on his machine (westernhagen) to see if I can
come up with a patch that compiles out affecting debug parts.
Comment 18 Klaus Kämpf 2005-08-10 08:51:53 UTC
ref 102954, the pre4 kernel showed the same behaviour with kacpid after a  
while. I now reverted to kernel-default-2.6.13_rc6_git1-2. kacpid runs  
normally for now. It looks like something triggers it to hog the CPU.  
  
Will reboot with ec_burst=1 now 
Comment 19 Olaf Kirch 2005-08-10 10:40:17 UTC
I'm using yesterday's CVS kernel (2.6.13-rc6 based) with ec_burst=1 
now, and this seems to help interactive behavior. 
 
kacpid is still burning a lot of CPU cycles though. With an uptime 
of 20 minutes, kacpid is already at 12 seconds cpu time (ie takes 1% of 
the CPU) 
 
echo 3 > /proc/acpi/debug_level does not change the picture. 
Comment 20 Olaf Kirch 2005-08-10 10:45:31 UTC
Okay, same as before. It still spends close to 25% of its time in 
acpi_ut_track_allocation. 
Comment 21 Andreas Kleen 2005-08-10 10:52:25 UTC
Created attachment 45491 [details]
Don't do a dumb linear search in memory debugging

The algorithm used to find existing objects
is just extremly dumb. It always walks through
all objects. This patch just disables that,
I doubt it is very useful for us anyways.
Comment 22 Andreas Kleen 2005-08-10 10:54:57 UTC
Can you test if helps? If yes I will check in the patch. 
 
Comment 23 Olaf Kirch 2005-08-10 10:56:49 UTC
Yes, I can see this patch would help :) 
 
I'll give it a try, with and without ec_burst=1. 
Comment 24 Olaf Kirch 2005-08-10 11:23:38 UTC
Much better. Even with ec_burst=0 kacpid is eating slightly less than 0.5 
seconds of CPU time per minute wall-clock. 
 
That is still on the order of 10 minutes per day of uptime, so people 
will notice and complain - but it shouldn't be noticeable performance-wise 
anymore, I think. 
Comment 25 Andreas Kleen 2005-08-10 11:26:17 UTC
Can you oprofile again with idle machine? (just for curiosity) 
 
Comment 26 Andreas Kleen 2005-08-10 11:27:58 UTC
I committed the patch now 
 
------------------------------------------------------------------- 
Wed Aug 10 12:55:33 CEST 2005 - ak@suse.de 
 
- patches.fixes/acpi-no-search: Disable too slow object search 
  in ACPI memory tracking (#102565) 
 
 
Comment 27 Olaf Kirch 2005-08-10 12:06:11 UTC
I looked at oprofile, and it's very much better indeed. 
acpi_ut_track_allocation turns up at less than 0.2 percent on an idle 
machine. 
 
The most prominent ACPI debug related symbol to show up in oprofile is 
acpi_ut_debug_print (1.7 percent) 
Comment 28 Klaus Kämpf 2005-08-11 08:06:38 UTC
btw, ec_burst=1 does NOT help. After running over night, kacpid shows almost 
40% CPU load: 
 
    7 root      12  -5     0    0    0 S 39.7  0.0 138:11.81 kacpid 
 
Comment 29 Andreas Kleen 2005-08-11 13:12:49 UTC
oprofile again please.
Comment 30 Dirk Mueller 2005-08-17 18:35:21 UTC
*** Bug 105290 has been marked as a duplicate of this bug. ***