Bugzilla – Bug 102565
acpi hangs the system for 2 - 3 seconds every 30 seconds
Last modified: 2007-06-06 20:09:02 UTC
On a AMD64 single processor system (westernhagen) (one of the first ones we got), kacpid has now 1040 minutes runtime. The system "hangs" for 2 - 3 seconds every 30 seconds, but I do not know why. /var/log/messages contains lots of those: [ACPI Debug] String: [0x0F] "Existing RTMP()" [ACPI Debug] String: [0x0F] "Entering RTMP()" [ACPI Debug] String: [0x0F] "Entering TIN2()" [ACPI Debug] String: [0x0F] "Existing RTMP()" which seem to appear around the hangs.
First I would upgrade the BIOS. Can you use oprofile to find out which functions use a lot of CPU time?
Created attachment 45119 [details] xx opreport -l output, toplines...
acpi_ut_track_allocation is the problem i guess.
Ok this means we need to disable CONFIG_ACPI_DEBUG again. Thomas what do you think?
What a pity. There is a lot traffic on ACPI list/code currently. I expect a lot of bugs the next days and without ACPI debug it's really hard to see anything. Could we wait some more days, switching back to ACPI_DEBUG_LITE shouldn't be a bigger problem in e.g. Beta3? Marcus: could you also test the newest kernel? There were ACPI bugfixes in every RC-X the last days. Can you also give it a try with ec_polling boot param?
Ok agreed. Should keep it on for some longer time.
i tried the beta1 kernel ... the hangs are gone now. i think this is fixed.
actually after 1 day of use it is starting to happen again... apparently this is exposed after some time of use?
*** Bug 103000 has been marked as a duplicate of this bug. ***
This is because of ACPI_DEBUG enabled and EC burst mode not enabled by default any more. For a fast workaround please try: ec_burst=burst The patch has been submitted around -rc3 and enabled by default. It has been thrown out at -rc5 and it has been modified and readded in -rc6 (not enabled by default). This patch is know to fix some machines (e.g. pressing ACPI power button will result in nothing, then pressing sleep/lid button will result in power button -> #61106 and some other strange issues) but it is also know to break some machines, e.g. ASUS L5D (too lazy to search bugzilla.kernel.org bug). This is really difficult, we could go for: - ACPI_DEBUG (which I think is convenient, especially for the SL product) - ec_burst=burst by default - try to blacklist not working burst mode machines or - !ACPI_DEBUG - ec_burst=polling by default - try to blacklist not working polling mode machines I'd say we go for burst mode per default and blacklist not working machines. I'd like to see ACPI_DEBUG=y in our final SL product (Andi?).
In either way, you're propsing a blacklist. How can e.g. the L5D be identified? Which other machines suffer from the problem. Since you can not guarantee a sane decision for all systems with a blacklist I'd prefer * ACPI_DEBUG on * ec_burst=burst * No blacklist Since mainline is aware of the problem, we'll have a fixed kernel sometime (and provide an update) for the few machines we'll break by setting ec_burst=burst for now.
removing mobile@suse.de from CC, we're aware of the bug and will follow-up using bugzilla
*** Bug 102954 has been marked as a duplicate of this bug. ***
I'm not very happy about ACPI_DEBUG=on but if no one from the kernel-list objects and aj agrees I think I can live with it. I am raising severity, since activating ACPI_DEBUG is intrusive, and 'normal' no longer justified for it.
I suspect only some parts of all the stuff enabled by ACPI_DEBUG cause problems. e.g. in Marcus Solo it seems to be that the memory tracking is slow. It would be probably best to just disable that part. We mainly want it just for the tracking capabilities.
Is there an option to disable it where it causes problems?
Sorry, I misinterpreted the code, the option should be ec_burst=1, ec_burst=0 to enable disable burst mode (if explicitly set by boot param you should see "EC burst mode." in syslog - KERN_INFO syslog level). I just sent olh a patch to enable burst mode by default. #16: I can have a look, but ACPI_DEBUG is an ugly bunch of code. I'll contact Marcus for testing on his machine (westernhagen) to see if I can come up with a patch that compiles out affecting debug parts.
ref 102954, the pre4 kernel showed the same behaviour with kacpid after a while. I now reverted to kernel-default-2.6.13_rc6_git1-2. kacpid runs normally for now. It looks like something triggers it to hog the CPU. Will reboot with ec_burst=1 now
I'm using yesterday's CVS kernel (2.6.13-rc6 based) with ec_burst=1 now, and this seems to help interactive behavior. kacpid is still burning a lot of CPU cycles though. With an uptime of 20 minutes, kacpid is already at 12 seconds cpu time (ie takes 1% of the CPU) echo 3 > /proc/acpi/debug_level does not change the picture.
Okay, same as before. It still spends close to 25% of its time in acpi_ut_track_allocation.
Created attachment 45491 [details] Don't do a dumb linear search in memory debugging The algorithm used to find existing objects is just extremly dumb. It always walks through all objects. This patch just disables that, I doubt it is very useful for us anyways.
Can you test if helps? If yes I will check in the patch.
Yes, I can see this patch would help :) I'll give it a try, with and without ec_burst=1.
Much better. Even with ec_burst=0 kacpid is eating slightly less than 0.5 seconds of CPU time per minute wall-clock. That is still on the order of 10 minutes per day of uptime, so people will notice and complain - but it shouldn't be noticeable performance-wise anymore, I think.
Can you oprofile again with idle machine? (just for curiosity)
I committed the patch now ------------------------------------------------------------------- Wed Aug 10 12:55:33 CEST 2005 - ak@suse.de - patches.fixes/acpi-no-search: Disable too slow object search in ACPI memory tracking (#102565)
I looked at oprofile, and it's very much better indeed. acpi_ut_track_allocation turns up at less than 0.2 percent on an idle machine. The most prominent ACPI debug related symbol to show up in oprofile is acpi_ut_debug_print (1.7 percent)
btw, ec_burst=1 does NOT help. After running over night, kacpid shows almost 40% CPU load: 7 root 12 -5 0 0 0 S 39.7 0.0 138:11.81 kacpid
oprofile again please.
*** Bug 105290 has been marked as a duplicate of this bug. ***