Bugzilla – Bug 118878
xinetd goes into endless loop eating 100% cpu with tftp
Last modified: 2008-07-16 15:46:39 UTC
serving tftp with xinetd doesnt work in 10.0, after a while, xinetd goes into an endless loop and doesnt serve connections anymore. I have seen this with the previews, but it happens also with rc4. 2303 root 25 0 3448 1056 872 R 44.8 0.1 205:21.43 xinetd maybe it happens if the child in.tftpd exits. this is on nectarine.suse.de it spins here endless: recv(5, 0x7fdbbf48, 256, 0) = -1 EAGAIN (Resource temporarily unavailable) recv(5, 0x7fdbbf48, 256, 0) = -1 EAGAIN (Resource temporarily unavailable) recv(5, 0x7fdbbf48, 256, 0) = -1 EAGAIN (Resource temporarily unavailable) recv(5, 0x7fdbbf48, 256, 0) = -1 EAGAIN (Resource temporarily unavailable) nectarine:~ # lsof -p 2303 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME xinetd 2303 root cwd DIR 3,11 2048 2 / xinetd 2303 root rtd DIR 3,11 2048 2 / xinetd 2303 root txt REG 3,11 206331 1761906 /usr/sbin/xinetd xinetd 2303 root mem REG 3,11 57953 2244848 /lib/libnss_files-2.3.5.so xinetd 2303 root mem REG 3,11 1510653 2258022 /lib/tls/libc-2.3.5.so xinetd 2303 root mem REG 3,11 50935 2244827 /lib/libcrypt-2.3.5.so xinetd 2303 root mem REG 3,11 549208 2258024 /lib/tls/libm-2.3.5.so xinetd 2303 root mem REG 3,11 105018 2244836 /lib/libnsl-2.3.5.so xinetd 2303 root mem REG 3,11 49694 401525 /lib/libwrap.so.0.7.6 xinetd 2303 root mem REG 0,0 0 [heap] (stat: No such file or directory) xinetd 2303 root mem REG 3,11 110322 401413 /lib/ld-2.3.5.so xinetd 2303 root mem REG 3,11 217016 2252810 /var/run/nscd/passwd xinetd 2303 root mem REG 3,11 217016 2252811 /var/run/nscd/group xinetd 2303 root 0r CHR 1,3 2587 /dev/null xinetd 2303 root 1r CHR 1,3 2587 /dev/null xinetd 2303 root 2r CHR 1,3 2587 /dev/null xinetd 2303 root 3r FIFO 0,5 65039 pipe xinetd 2303 root 4w FIFO 0,5 65039 pipe xinetd 2303 root 5u IPv4 65054 UDP *:tftp xinetd 2303 root 6w REG 3,11 9396 61592 /var/log/xinetd.log xinetd 2303 root 7u unix 0xee58c8c0 65040 socket xinetd 2303 root 8u IPv4 65055 TCP *:5901 (LISTEN) xinetd 2303 root 9u IPv4 65056 TCP *:5801 (LISTEN)
Do you need tftp for your work, or can you live with this till next week? (I'm not here from Wednesday to Friday).
I can surely kill xinetd and restart it, for the time being.
btw, rcxinetd restart doesnt work in that case. xinetd handles signals, even sig11 doesnt lead to a crash. Maybe rcxinetd restart should send sigkill if xinetd doesnt react.
I couldn't reproduce it yet, but it might be caused by this: --- xinetd/util.c.orig +++ xinetd/util.c @@ -234,7 +234,7 @@ void drain( int sd ) { char buf[ 256 ] ; /* This size is arbitrarily chosen */ - char cc ; + int cc ; int old_val ; /* Put in non-blocking mode so we don't hang. */ Could you try it please? Thanks.
Created attachment 50932 [details] xinetd.deadlock.patch that would be an unsigned char for us. It should be size_t according to my man page. will try the patch.
this patch is likely correct, but it doesnt help.
Which patch did you try? size_t in your patch is wrong, you need something signed (ssize_t) to store -1 on error. size_t in glibc is a typedef for unsigned (long) int, depending on 32b/64b architecture.
its ssize_t, misread the man page. will retry tomorrow.
it works with ssize_t
Great. Can you confirm that only xinetd/util.c needs to be patched? The buffers are allways far below 2GB, so changing int to ssize_t is not necessary IMO. I'll submit just the small patch from comment #4.
recv does return a ssize_t, so the result should be stuffed in such a variable.
Yes, but everywhere in xinetd sources the upper limit recv() can return (its 'len' argument) is just a few kilobytes which fits in int well. ssize_t vs. int does matter if you have arbitrary long blocks of memory on eg. x86_64 (where int is "only" 2GB-1B). Of course the code would look better if the author used ssize_t in first place, but it's IMHO not worth patching it in the RPM (I want the patches be as short as possible -- less headache when updating upstream sources). The problem here was that in xinetd/util.c recv() could return a number between -1 and 256, while an unsigned char can hold 0..255.
I've also added the patch to SLES9-SP3.