Bug 118878

Summary: xinetd goes into endless loop eating 100% cpu with tftp
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Olaf Hering <ohering>
Component: NetworkAssignee: Michal Marek <mmarek>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: Final   
Target Milestone: ---   
Hardware: PowerPC   
OS: Linux   
Whiteboard:
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: xinetd.deadlock.patch

Description Olaf Hering 2005-09-26 19:41:19 UTC
serving tftp with xinetd doesnt work in 10.0, after a while, xinetd goes into an
endless loop and doesnt serve connections anymore.
I have seen this with the previews, but it happens also with rc4.

 2303 root      25   0  3448 1056  872 R 44.8  0.1 205:21.43 xinetd            
                                                                               
     

maybe it happens if the child in.tftpd exits.
this is on nectarine.suse.de

it spins here endless:

recv(5, 0x7fdbbf48, 256, 0)             = -1 EAGAIN (Resource temporarily
unavailable)
recv(5, 0x7fdbbf48, 256, 0)             = -1 EAGAIN (Resource temporarily
unavailable)
recv(5, 0x7fdbbf48, 256, 0)             = -1 EAGAIN (Resource temporarily
unavailable)
recv(5, 0x7fdbbf48, 256, 0)             = -1 EAGAIN (Resource temporarily
unavailable)


nectarine:~ # lsof -p 2303
COMMAND  PID USER   FD   TYPE     DEVICE    SIZE    NODE NAME
xinetd  2303 root  cwd    DIR       3,11    2048       2 /
xinetd  2303 root  rtd    DIR       3,11    2048       2 /
xinetd  2303 root  txt    REG       3,11  206331 1761906 /usr/sbin/xinetd
xinetd  2303 root  mem    REG       3,11   57953 2244848 /lib/libnss_files-2.3.5.so
xinetd  2303 root  mem    REG       3,11 1510653 2258022 /lib/tls/libc-2.3.5.so
xinetd  2303 root  mem    REG       3,11   50935 2244827 /lib/libcrypt-2.3.5.so
xinetd  2303 root  mem    REG       3,11  549208 2258024 /lib/tls/libm-2.3.5.so
xinetd  2303 root  mem    REG       3,11  105018 2244836 /lib/libnsl-2.3.5.so
xinetd  2303 root  mem    REG       3,11   49694  401525 /lib/libwrap.so.0.7.6
xinetd  2303 root  mem    REG        0,0               0 [heap] (stat: No such
file or directory)
xinetd  2303 root  mem    REG       3,11  110322  401413 /lib/ld-2.3.5.so
xinetd  2303 root  mem    REG       3,11  217016 2252810 /var/run/nscd/passwd
xinetd  2303 root  mem    REG       3,11  217016 2252811 /var/run/nscd/group
xinetd  2303 root    0r   CHR        1,3            2587 /dev/null
xinetd  2303 root    1r   CHR        1,3            2587 /dev/null
xinetd  2303 root    2r   CHR        1,3            2587 /dev/null
xinetd  2303 root    3r  FIFO        0,5           65039 pipe
xinetd  2303 root    4w  FIFO        0,5           65039 pipe
xinetd  2303 root    5u  IPv4      65054             UDP *:tftp 
xinetd  2303 root    6w   REG       3,11    9396   61592 /var/log/xinetd.log
xinetd  2303 root    7u  unix 0xee58c8c0           65040 socket
xinetd  2303 root    8u  IPv4      65055             TCP *:5901 (LISTEN)
xinetd  2303 root    9u  IPv4      65056             TCP *:5801 (LISTEN)
Comment 1 Michal Marek 2005-09-27 10:59:17 UTC
Do you need tftp for your work, or can you live with this till next week? (I'm
not here from Wednesday to Friday).
Comment 2 Olaf Hering 2005-09-27 11:10:00 UTC
I can surely kill xinetd and restart it, for the time being.
Comment 3 Olaf Hering 2005-09-27 11:58:09 UTC
btw, rcxinetd restart doesnt work in that case. xinetd handles signals, even
sig11 doesnt lead to a crash. Maybe rcxinetd restart should send sigkill if
xinetd doesnt react.
Comment 4 Michal Marek 2005-09-27 12:32:10 UTC
I couldn't reproduce it yet, but it might be caused by this:

--- xinetd/util.c.orig
+++ xinetd/util.c
@@ -234,7 +234,7 @@
 void drain( int sd )
 {
    char buf[ 256 ] ; /* This size is arbitrarily chosen */
-   char cc ;
+   int cc ;
    int old_val ;
 
    /* Put in non-blocking mode so we don't hang. */

Could you try it please? Thanks.
Comment 5 Olaf Hering 2005-09-27 12:44:54 UTC
Created attachment 50932 [details]
xinetd.deadlock.patch

that would be an unsigned char for us. It should be size_t according to my man
page.

will try the patch.
Comment 6 Olaf Hering 2005-09-27 13:20:10 UTC
this patch is likely correct, but it doesnt help.
Comment 7 Michal Marek 2005-10-03 08:38:07 UTC
Which patch did you try? size_t in your patch is wrong, you need
something signed (ssize_t) to store -1 on error. size_t in glibc is a
typedef for unsigned (long) int, depending on 32b/64b architecture.
Comment 8 Olaf Hering 2005-10-03 10:34:49 UTC
its ssize_t, misread the man page. will retry tomorrow.
Comment 9 Olaf Hering 2005-10-04 10:46:29 UTC
it works with ssize_t
Comment 10 Michal Marek 2005-10-04 11:22:09 UTC
Great. Can you confirm that only xinetd/util.c needs to be patched? The
buffers are allways far below 2GB, so changing int to ssize_t is not
necessary IMO. I'll submit just the small patch from comment #4.
Comment 11 Olaf Hering 2005-10-04 12:44:58 UTC
recv does return a ssize_t, so the result should be stuffed in such a variable.
Comment 12 Michal Marek 2005-10-04 13:10:27 UTC
Yes, but everywhere in xinetd sources the upper limit recv() can return
(its 'len' argument) is just a few kilobytes which fits in int well.
ssize_t vs. int does matter if you have arbitrary long blocks of memory
on eg. x86_64 (where int is "only" 2GB-1B). Of course the code would
look better if the author used ssize_t in first place, but it's IMHO not
worth patching it in the RPM (I want the patches be as short as
possible -- less headache when updating upstream sources).

The problem here was that in xinetd/util.c recv() could return a number
between -1 and 256, while an unsigned char can hold 0..255.
Comment 13 Michal Marek 2005-10-10 09:27:54 UTC
I've also added the patch to SLES9-SP3.