Bugzilla – Bug 545995
PAM/nscd prevents logins on systems with less than 130MB memory
Last modified: 2009-11-13 15:34:21 UTC
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.23) Gecko/20090912 SUSE/1.1.18-1.1 SeaMonkey/1.1.18 On my server I run several VMs for small tasks. Today I found that I could no more log into my openSUSE-11.1 VM running nothing but bind9 which has reasonable 64MB RAM and 150MB swap. After testing I found that cause was a "zypper up" which updated nscd yesterday and obviously also did an "insserv nscd" in the process. Booting your normal system with mem=100M could probably also trigger this. Reproducible: Always Steps to Reproduce: 1. have x86_64 OS and 3GB free disk space to run my VM: zypper in qemu-svn #0.11 wget -nc http://www1.zq1.de/bernhard/temp/opensuse-11.1-bad.img.gz gzip -cd opensuse-11.1-bad.img.gz > opensuse-11.1-bad.img 2. qemu-system-x86_64 -hda opensuse-11.1-bad.img -m 128 3. wait til it is booted up, then login root root 4. and test rcnscd start su - ssh root@localhost rcnscd stop Actual Results: # su - su: user root does not exist # ssh root@localhost You don't exist, go away! # tail /var/log/messages ... cannot write to database file /var/run/nscd/services: Cannot allocate memory Expected Results: # su - # # ssh root@localhost Password: Last login: ... Have a lot of fun... Further tests showed that even with 140MB RAM there are errors in /var/log/messages. "free" shows below 20MB used by apps. Effects and error messages vary with amount of free/used RAM. With 64MB RAM+swap, the error messages are even worse, as they never mention nscd - just non-existing users "root" and "sshd" for priviledge separation. also tried setting up extra swap with dd if=/dev/zero of=/tmp/swap bs=1M count=192 mkswap /tmp/swap swapon /tmp/swap and found that at 100MB even without nscd, ssh works only when swap is added at boot time (i.e. via /etc/fstab). So maybe PAM causes this? I adjusted /etc/init.d/boot:111 to provide enough inodes in /dev even on low-mem systems. My old openSUSE-10.3 system without nscd ran fine (apart from some memory leaks) with 32MB for over a year.
I have done some test row at varying levels of total RAM using qemu-kvm. what I did was very much like the above: login as root rcnscd start # to trigger the bug su - what I found is only partially making sense to me last time I saw that at 128M su fails at 140M su succeeds but syslog has cannot write to database file /var/run/nscd/hosts: No such file or directory cannot write to database file /var/run/nscd/services: Cannot allocate memory at 180M su succeeds but syslog has cannot write to database file /var/run/nscd/services: Cannot allocate memory at 198M su succeeds but syslog has could not start clean-up thread; terminating at 205M su succeeds but syslog has could not start any worker thread; terminating at 209M su succeeds at 210-215M su fails badly # su - su: user root does not exist and no message is added to syslog at 216M and above, su succeeds without error again. Do you have any way to explain how something can fail with more memory? btw: stopping other services or occupying extra RAM did not seem to alter behaviour.
I also did at 215M strace id root socket(PF_FILE, 0x80801 /* SOCK_??? */, 0) = 3 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0 sendto(3, "\2\0\0\0\v\0\0\0\7\0\0\0passwd\0", 19, MSG_NOSIGNAL, NULL, 0) = 19 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=3, revents=POLLIN|POLLHUP}]) recvmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"passwd\0", 7}, {"\270O\3\0\0\0\0\0", 8}], msg_controllen=24, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {4}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 15 mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x7fa9f529a000 close(4) = 0 close(3) = 0 socket(PF_FILE, 0x80801 /* SOCK_??? */, 0) = 3 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0 sendto(3, "\2\0\0\0\0\0\0\0\5\0\0\0root\0", 17, MSG_NOSIGNAL, NULL, 0) = 17 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=3, revents=POLLIN|POLLHUP}]) read(3, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"..., 36) = 36 close(3) = 0 then it wrote "id: root: No such user" but at 216M "id root" does socket(PF_FILE, 0x80801 /* SOCK_??? */, 0) = 3 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0 sendto(3, "\2\0\0\0\v\0\0\0\7\0\0\0passwd\0", 19, MSG_NOSIGNAL, NULL, 0) = 19 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=3, revents=POLLIN|POLLHUP}]) recvmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"passwd\0", 7}, {"\270O\3\0\0\0\0\0", 8}], msg_controllen=24, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {4}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 15 mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x7f3b7b394000 close(4) = 0 close(3) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3b7b393000 socket(PF_FILE, 0x80801 /* SOCK_??? */, 0) = 3 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0 sendto(3, "\2\0\0\0\f\0\0\0\6\0\0\0group\0", 18, MSG_NOSIGNAL, NULL, 0) = 18 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=3, revents=POLLIN|POLLHUP}]) recvmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"group\0", 6}, {"\270O\3\0\0\0\0\0", 8}], msg_controllen=24, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {4}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 14 mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x7f3b7b35e000 close(4) = 0 close(3) In the second part I can see an extra fstat to my pts and writing "group" instead of "root" onto the socket. So something on the requesting side is already different.
re-assign to nscd maintainer
I'm sorry, 11.1 is in maintenance mode where we don't have the resources to fix bugs like this ourselves, especially since 11.2 uses a completely different nscd daemon (unscd). If this issue still persists for you in 11.2, I will look into it. Alternatively, you may report the issue upstream (nscd in 11.1 is part of glibc).