Bugzilla – Bug 117884
acpid is leaking a file descriptor on every client disconnect
Last modified: 2007-06-05 11:20:16 UTC
On my toshiba laptop, acpid and hald-addon-acpi are eating all of the CPU cycles. Top looks like this: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5435 root 25 0 2400 572 308 R 65.1 0.1 10:52.92 acpid 5711 root 16 0 1788 172 136 S 27.2 0.0 4:21.36 hald-addon-acpi /var/log/acpid is full of lines like this: [Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files [Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files [Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files [Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files [Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files [Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files [Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files [Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files /var/log/acpid grows at about 2MB/sec!. It has filled the / filesystem. If I clobber it with "> /var/log/acpid" it starts over and fills the filesystem again. I'll attach gdb and strace output from attaching to these procs. It's a vanilla 10.0RC4 install. Haven't done much with it other than install, and suspend to disk + resume.
Created attachment 50358 [details] gdb stacktraces of both processes
Created attachment 50359 [details] strace of acpid
Created attachment 50360 [details] strace of hald-addon-acpi
Created attachment 50362 [details] hwinfo for Toshiba Tecra 9000
I just noticed that the strace.acpid.out might not be very usefull since it's full of 'no space left on device' errors. However, even after clobbering the acpid logfile so that there was planty of free space on the device, the behavior continues. The acpid problem caused the disk to be full, not the other way around. Unfortunately I've since killed acpid so I can't get an strace while the filesystem isn't full. :( I'll get another if it happens again.
please add output of 'lsof | grep acpi'
I've got strace output now and the filesystem wasn't full. Sorry, forgot to get lsof output. :( I'll get it next time. Thus far this has happened every time I've used the laptop since putting 10.0 on it, so it seems to be a serious problem.
Created attachment 50660 [details] new strace output of acpid
Created attachment 50661 [details] new strace output halld-addon-acpi
acpid loops with this: | accept(4, 0xbf9b380a, [2]) = -1 EMFILE (Too many open files) | time(NULL) = 1127404200 | write(2, "[Thu Sep 22 09:50:00 2005] ", 27) = 27 | write(2, "ERR: can\'t accept client: Too ma"..., 46) = 46 | poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN, revents=POLLIN}], 2, -1) = 1 EMFILE == The per-process limit of open file descriptors has been reached. So we really need lsof to find out why acpid is opening so many files. on my system it's: strolchi:~ # lsof|grep ^acpid | wc -l 14 hald-addon-acpi is just trying to connect continuously and failing, but this is not the real problem.
When I run watch "lsof | grep ^acpid | wc -l", it seems that acpid leaks two file handles every 15 seconds.
Created attachment 50693 [details] strace of acpid before the file handle limit is reached. During this strace, 'lsof | grep ^ acpid | wc -l' was less than 300, but increased by two every 15 seconds or so.
Created attachment 50694 [details] output of lsof | grep ^ acpid while the number of open files was increasing
Just into the dark: Do you encounter the same behaviour if you unplug the second battary?
i reproduce this: acpid leaks a filedescriptor on every "client xxx has disconnected". Investigating. You could try to check out, who is connecting and disconnecting every 15 seconds, but the root of evil is in acpid itself. Reassigning to maintainer.
I believe that the fix is quite simple: --- acpid-1.0.4~/event.c 2004-02-03 03:38:52.000000000 +0100 +++ acpid-1.0.4/event.c 2005-09-23 11:05:24.000000000 +0200 @@ -589,6 +589,7 @@ /* closed */ acpid_log("client has disconnected\n"); delist_rule(&client_list, rule); + close(client); return -1; } safe_write(client, "\n", 1);
aj, could you please provide a SWAMP id for this issue?
this is a reincarnation of bug #117964 so we should also fix it with a YOU for 10.0. The same swampid as #117964 or a new one?
stupid me
Good. I got the fix confirmed from upstream, acpid is additionally leaking a struct rule at this place which i will fix in an additional package on monday. But this one should get you over the weekend (the other leak are just few bytes for every cliene ;-)
Reuse the swamp-ID, please.
You update is available, close the bug