Bug 117884

Summary: acpid is leaking a file descriptor on every client disconnect
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Bart Whiteley <bwhiteley>
Component: BasesystemAssignee: Forgotten User ZhJd0F0L3x <forgotten_ZhJd0F0L3x>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: dkukawka
Version: RC 4   
Target Milestone: ---   
Hardware: Other   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: gdb stacktraces of both processes
strace of acpid
strace of hald-addon-acpi
hwinfo for Toshiba Tecra 9000
new strace output of acpid
new strace output halld-addon-acpi
strace of acpid before the file handle limit is reached.
output of lsof | grep ^ acpid while the number of open files was increasing

Description Bart Whiteley 2005-09-19 20:57:46 UTC
On my toshiba laptop, acpid and hald-addon-acpi are eating all of the CPU cycles. 
Top looks like this: 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5435 root      25   0  2400  572  308 R 65.1  0.1  10:52.92 acpid
 5711 root      16   0  1788  172  136 S 27.2  0.0   4:21.36 hald-addon-acpi

/var/log/acpid is full of lines like this:
[Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files
[Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files
[Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files
[Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files
[Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files
[Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files
[Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files
[Mon Sep 19 14:38:43 2005] ERR: can't accept client: Too many open files

/var/log/acpid grows at about 2MB/sec!. It has filled the / filesystem. 
If I clobber it with "> /var/log/acpid" it starts over and fills the filesystem
again. 

I'll attach gdb and strace output from attaching to these procs. 
It's a vanilla 10.0RC4 install.  Haven't done much with it other than install, 
and suspend to disk + resume.
Comment 1 Bart Whiteley 2005-09-19 21:00:33 UTC
Created attachment 50358 [details]
gdb stacktraces of both processes
Comment 2 Bart Whiteley 2005-09-19 21:01:27 UTC
Created attachment 50359 [details]
strace of acpid
Comment 3 Bart Whiteley 2005-09-19 21:01:50 UTC
Created attachment 50360 [details]
strace of hald-addon-acpi
Comment 4 Bart Whiteley 2005-09-19 21:02:15 UTC
Created attachment 50362 [details]
hwinfo for Toshiba Tecra 9000
Comment 5 Bart Whiteley 2005-09-19 21:17:22 UTC
I just noticed that the strace.acpid.out might not be very usefull since it's
full of 'no space left on device' errors.  However, even after clobbering the
acpid logfile so that there was planty of free space on the device, the behavior
continues.  The acpid problem caused the disk to be full, not the other way
around.  Unfortunately I've since killed acpid so I can't get an strace while
the filesystem isn't full.  :(  I'll get another if it happens again. 
Comment 6 Danny Al-Gaaf 2005-09-20 08:46:08 UTC
please add output of 'lsof | grep acpi'
Comment 7 Bart Whiteley 2005-09-22 16:00:09 UTC
I've got strace output now and the filesystem wasn't full.  Sorry, forgot to 
get lsof output.  :(  I'll get it next time.  Thus far this has happened every 
time I've used the laptop since putting 10.0 on it, so it seems to be a 
serious problem.  
Comment 8 Bart Whiteley 2005-09-22 16:01:41 UTC
Created attachment 50660 [details]
new strace output of acpid
Comment 9 Bart Whiteley 2005-09-22 16:02:41 UTC
Created attachment 50661 [details]
new strace output halld-addon-acpi
Comment 10 Forgotten User ZhJd0F0L3x 2005-09-22 16:24:15 UTC
acpid loops with this:

| accept(4, 0xbf9b380a, [2])              = -1 EMFILE (Too many open files)
| time(NULL)                              = 1127404200
| write(2, "[Thu Sep 22 09:50:00 2005] ", 27) = 27
| write(2, "ERR: can\'t accept client: Too ma"..., 46) = 46
| poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN, revents=POLLIN}], 2, -1) = 1

EMFILE == The per-process limit of open file descriptors has been reached.

So we really need lsof to find out why acpid is opening so many files.

on my system it's:
strolchi:~ # lsof|grep ^acpid | wc -l
14

hald-addon-acpi is just trying to connect continuously and failing, but this is
not the real problem.
Comment 11 Bart Whiteley 2005-09-22 22:06:58 UTC
When I run watch "lsof | grep ^acpid | wc -l", it seems that acpid leaks two 
file handles every 15 seconds.  
Comment 12 Bart Whiteley 2005-09-22 22:09:58 UTC
Created attachment 50693 [details]
strace of acpid before the file handle limit is reached. 

During this strace, 'lsof | grep ^ acpid | wc -l' was less than 300, but
increased by two every 15 seconds or so.
Comment 13 Bart Whiteley 2005-09-22 22:10:50 UTC
Created attachment 50694 [details]
output of lsof | grep ^ acpid while the number of open files was increasing
Comment 14 Timo Hoenig 2005-09-23 07:16:55 UTC
Just into the dark: Do you encounter the same behaviour if you unplug the second battary?
Comment 15 Forgotten User ZhJd0F0L3x 2005-09-23 08:59:18 UTC
i reproduce this: acpid leaks a filedescriptor on every "client xxx has
disconnected". Investigating.

You could try to check out, who is connecting and disconnecting every 15
seconds, but the root of evil is in acpid itself.

Reassigning to maintainer.
Comment 16 Forgotten User ZhJd0F0L3x 2005-09-23 09:40:23 UTC
I believe that the fix is quite simple:

--- acpid-1.0.4~/event.c        2004-02-03 03:38:52.000000000 +0100
+++ acpid-1.0.4/event.c 2005-09-23 11:05:24.000000000 +0200
@@ -589,6 +589,7 @@
                /* closed */
                acpid_log("client has disconnected\n");
                delist_rule(&client_list, rule);
+               close(client);
                return -1;
        }
        safe_write(client, "\n", 1);
Comment 18 Timo Hoenig 2005-09-23 10:49:43 UTC
aj, could you please provide a SWAMP id for this issue?
Comment 19 Forgotten User ZhJd0F0L3x 2005-09-23 10:56:49 UTC
this is a reincarnation of bug #117964 so we should also fix it with a YOU for
10.0. The same swampid as #117964 or a new one?
Comment 20 Forgotten User ZhJd0F0L3x 2005-09-23 11:20:18 UTC
stupid me
Comment 22 Forgotten User ZhJd0F0L3x 2005-09-23 22:33:45 UTC
Good. I got the fix confirmed from upstream, acpid is additionally leaking a
struct rule at this place which i will fix in an additional package on monday.
But this one should get you over the weekend (the other leak are just few bytes
for every cliene ;-)
Comment 23 Andreas Jaeger 2005-09-26 08:21:07 UTC
Reuse the swamp-ID, please.
Comment 24 Danny Al-Gaaf 2005-10-10 13:31:11 UTC
You update is available, close the bug