Bug 387202 - nscd keeps crashing in mem.c
Summary: nscd keeps crashing in mem.c
Status: RESOLVED FIXED
: 157078 374990 388435 417865 426396 426679 439210 446233 467393 (view as bug list)
Alias: None
Product: openSUSE 11.0
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Final
Hardware: x86-64 Other
: P3 - Medium : Major with 18 votes (vote)
Target Milestone: Future/Later
Assignee: Petr Baudis
QA Contact: E-mail List
URL:
Whiteboard: maint:released:11.0:21210 maint:rele...
Keywords:
Depends on:
Blocks: 266219
  Show dependency treegraph
 
Reported: 2008-05-06 13:25 UTC by Michal Marek
Modified: 2010-02-03 09:19 UTC (History)
27 users (show)

See Also:
Found By: Development
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
nscd log (1.93 KB, text/plain)
2008-05-06 13:27 UTC, Michal Marek
Details
nscd core file (263.95 KB, application/x-gzip)
2008-07-03 11:08 UTC, Bob Vickers
Details
NSCD debug log (14.38 KB, application/octet-stream)
2008-08-18 16:38 UTC, James Faulkner
Details
NSCD debug log 2 (369.05 KB, application/octet-stream)
2008-08-18 16:39 UTC, James Faulkner
Details
sample watchdog script (1018 bytes, application/octet-stream)
2008-12-08 09:33 UTC, Bob Vickers
Details
nscd core file plus log messages and config files (220.00 KB, application/x-tar)
2009-01-06 11:58 UTC, Bob Vickers
Details
nscd backtrace (2.58 KB, application/x-gzip)
2009-02-06 08:32 UTC, Roland Bernet
Details
nscd core, nscd.conf, nsswitch.conf (207.92 KB, application/x-gzip)
2009-02-10 10:34 UTC, Roland Bernet
Details
nscd core, nscd.conf, nsswitch.conf (212.16 KB, application/x-tbz)
2009-02-12 21:00 UTC, Hans-Peter Jansen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Marek 2008-05-06 13:25:46 UTC
Hi,

on my machine, nscd always crashes with an assertion after some time:

25033: handle_request: request received (Version = 2) from PID 25253
25033:  GETFDPW
25033: provide access to FD 5, for passwd
25033: Reloading "0" in password cache!
25033: Reloading "10020" in password cache!
25033: remove GETPWBYNAME entry "mmarek"
25033: remove GETPWBYUID entry "10020"
nscd: mem.c:399: gc: Assertion `next_hash == &he[db->head->nentries]' failed.

or

25991: handle_request: request received (Version = 2) from PID 26107
25991:  GETPWBYNAME (nobody)
25991: Haven't found "nobody" in password cache!
25991: Reloading "mmarek" in password cache!
25991: remove GETPWBYNAME entry "mmarek"
25991: remove GETPWBYUID entry "10020"
nscd: mem.c:392: gc: Assertion `off_alloc == off_allocend' failed.
Comment 1 Michal Marek 2008-05-06 13:27:19 UTC
Created attachment 212695 [details]
nscd log

log output from the last run. I did
rm /var/run/nscd/*
/usr/sbin/nscd -d 2>&1 | tee log-nscd
Comment 4 Michal Marek 2008-05-06 13:31:27 UTC
Petr?
Comment 5 Petr Baudis 2008-06-25 23:30:38 UTC
Hmm, do you still encounter this with the 11.0 nscd?
Comment 6 Michal Marek 2008-06-26 12:14:42 UTC
Yes.
Comment 8 Michal Marek 2008-06-26 12:16:34 UTC
$ rpm -q nscd
nscd-2.8-15
Comment 9 Petr Baudis 2008-06-27 00:38:11 UTC
*** Bug 388435 has been marked as a duplicate of this bug. ***
Comment 10 Bob Vickers 2008-07-03 11:08:09 UTC
Created attachment 225788 [details]
nscd core file

I too am seeing many nscd crashes, sometimes every few minutes, and this  stops Thunderbird working. I have attached a core file: is there any other information that would be useful? I am also happy to test any fixes that might be available.

nscd is version 2.8-14.1 running on Opensuse 11.0 x86_64.

Bob
Comment 11 Petr Baudis 2008-07-03 23:04:49 UTC
I can reproduce this myself, just so far didn't figure out what the bug is. I'm still working on it.
Comment 12 Jon Nelson 2008-07-20 13:05:52 UTC
Does this help?

From /var/log/nscd.log (enabled by hand):

17429: pruning services cache; time 1216524649
17429: considering GETSERVBYPORT entry "`nɑ/tcp", timeout 1216552141
17429: considering GETSERVBYPORT entry " 372^K211e^?/tcp", timeout 1216552130
17429: considering GETSERVBYPORT entry "@/خ/tcp", timeout 1216551996
17429: considering GETSERVBYPORT entry " 272L?354^?/tcp", timeout 1216552070
17429: considering GETSERVBYPORT entry " 332f8343^?/tcp", timeout 1216552142
17429: considering GETSERVBYPORT entry " 252372rU^?/tcp", timeout 1216551945
17429: considering GETSERVBYPORT entry " e5301/tcp", timeout 1216552050
17429: considering GETSERVBYPORT entry "0214247^A/tcp", timeout 1216552119
17429: considering GETSERVBYPORT entry " 312Q=i^?/tcp", timeout 1216552080
17429: considering GETSERVBYPORT entry " ^ZQ356^C^?/tcp", timeout 1216552070
17429: considering GETSERVBYNAME entry "netbios-ns/tcp", timeout 1216552463
17429: considering GETSERVBYNAME entry "bootps/udp", timeout 1216552463
17429: considering GETSERVBYPORT entry "", timeout 1216551905
17429: considering GETSERVBYPORT entry "220u=O/tcp", timeout 1216552098
17429: considering GETSERVBYPORT entry " 272237303^?^?/tcp", timeout 1216552087
17429: considering GETSERVBYPORT entry " 352317/,^?/tcp", timeout 1216552080
17429: considering GETSERVBYPORT entry " :_^^352^?/tcp", timeout 1216551945
17429: considering GETSERVBYNAME entry "ipp/udp", timeout 1216552463
17429: considering GETSERVBYPORT entry "260316317^K/tcp", timeout 1216552087
17429: considering GETSERVBYPORT entry "`fIESC/tcp", timeout 1216552113
17429: considering GETSERVBYPORT entry "@@347^X/tcp", timeout 1216552391
17429: considering GETSERVBYPORT entry "320315301313/tcp", timeout 1216552087
17429: considering GETSERVBYPORT entry "321^B", timeout 1216551905
17429: considering GETSERVBYPORT entry "pVK261/tcp", timeout 1216552050
17429: considering GETSERVBYPORT entry "^P!365(/tcp", timeout 1216552391
17429: considering GETSERVBYPORT entry " *323 247^?/tcp", timeout 1216552391
17429: considering GETSERVBYPORT entry " 212kb267^?/tcp", timeout 1216551934
17429: considering GETSERVBYNAME entry "netbios-ssn/tcp", timeout 1216552463
17429: considering GETSERVBYPORT entry "^Pr^E240/tcp", timeout 1216552130
17429: considering GETSERVBYPORT entry "@322^FH/tcp", timeout 1216552097
17429: considering GETSERVBYPORT entry " ʧuI^?/tcp", timeout 1216552113
17429: considering GETSERVBYPORT entry " 272255^CM^?/tcp", timeout 1216552087
17429: considering GETSERVBYPORT entry " ʻ205367^?/tcp", timeout 1216551905
...
and then it dies a little bit later.
Comment 13 James Faulkner 2008-08-18 16:38:03 UTC
Created attachment 233949 [details]
NSCD debug log

I am also seeing this bug on 2 systems which use LDAP account information from a RHEL 5 server.  I'm attaching the first system's nscd debug log.
Comment 14 James Faulkner 2008-08-18 16:39:26 UTC
Created attachment 233951 [details]
NSCD debug log 2

the 2nd system's NSCD debug log.
Comment 15 James Faulkner 2008-08-18 16:43:56 UTC
NSCD is pretty critical for reducing the load on my LDAP server.  I would be happy to run some test cases or debug code for you if you want.  I have no trouble crashing nscd very quickly on my OpenSUSE 11.0 systems.
Comment 16 Jon Schewe 2008-08-22 16:49:47 UTC
I too am having the same problem. I'm using LDAP for account information and kerberos for passwords. I'm seeing nscd crash on all of my servers at least every 15 minutes (I've got a script setup to restart every 5 if it's dead). I'm having this problem both in dom0 and in domU on xen as well as at home on my non-xen systems.

I installed libnscd-debuginfo and then ran nscd -d in gdb and got the following:
...
6685: Reloading "103" in password cache!
6685: Reloading "13" in password cache!
6685: Reloading "100" in password cache!
6685: remove GETPWBYUID entry "0"
6685: remove GETPWBYNAME entry "root"
nscd: mem.c:399: gc: Assertion `next_hash == &he[db->head->nentries]' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x4103f950 (LWP 6688)]
0x00007fe6907ce5c5 in raise () from /lib64/libc.so.6
(gdb) where
#0  0x00007fe6907ce5c5 in raise () from /lib64/libc.so.6
#1  0x00007fe6907cfbb3 in abort () from /lib64/libc.so.6
#2  0x00007fe6907c71e9 in __assert_fail () from /lib64/libc.so.6
#3  0x00007fe691362b68 in ?? () from /usr/sbin/nscd
#4  0x00007fe691361494 in ?? () from /usr/sbin/nscd
#5  0x00007fe6913582c6 in ?? () from /usr/sbin/nscd
#6  0x00007fe690d14040 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fe69086f0cd in clone () from /lib64/libc.so.6
(gdb) 

Unfortunately it doesn't appear there is a debuginfo package for nscd, so this doesn't help quite as much as I'd hoped.
Comment 17 Jon Schewe 2008-09-04 13:50:53 UTC
Which debuginfo packages would include the appropriate symbols to be able to get function names from the errors of nscd shown above?
Comment 18 Marc Schütz 2008-09-25 12:39:18 UTC
(In reply to comment #17 from Jon Schewe)
> Which debuginfo packages would include the appropriate symbols to be able to
> get function names from the errors of nscd shown above?
> 

glibc-debuginfo
Comment 19 Jon Schewe 2008-09-25 13:29:26 UTC
Thanks. Now I've got a real stack trace to share. Took all of 10 mintues for it to crash this time.
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
(gdb) run -d
Starting program: /usr/sbin/nscd -d
[Thread debugging using libthread_db enabled]
[New Thread 0x7f8d562276f0 (LWP 818)]
[New Thread 0x4102f950 (LWP 821)]
[New Thread 0x42112950 (LWP 822)]
[New Thread 0x415a7950 (LWP 823)]
[New Thread 0x417a8950 (LWP 824)]
[New Thread 0x419a9950 (LWP 825)]
[New Thread 0x41baa950 (LWP 826)]
[New Thread 0x40584950 (LWP 827)]
[New Thread 0x40785950 (LWP 828)]
818: Reloading "root" in group cache!
818: remove INITGROUPS entry "root"
nscd: mem.c:392: gc: Assertion `off_alloc == off_allocend' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x42112950 (LWP 822)]
0x00007f8d556be5c5 in *__GI_raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
	in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) where
#0  0x00007f8d556be5c5 in *__GI_raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f8d556bfbb3 in *__GI_abort () at abort.c:88
#2  0x00007f8d556b71e9 in *__GI___assert_fail (
    assertion=0x7f8d5625b3d0 "off_alloc == off_allocend", 
    file=0x7f8d5625b379 "mem.c", line=392, function=0x7f8d5625b450 "gc")
    at assert.c:78
#3  0x00007f8d56252ba6 in gc (db=0x7f8d5645f200) at mem.c:392
#4  0x00007f8d56251494 in prune_cache (table=0x7f8d5645f200, now=1222348776, 
    fd=-1) at cache.c:499
#5  0x00007f8d562482c6 in nscd_run_prune (p=<value optimized out>)
    at connections.c:1390
#6  0x00007f8d55c04040 in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#7  0x00007f8d5575f0cd in clone () from /lib64/libc.so.6
(gdb) print off_alloc
$1 = 1436420560
(gdb) print off_allocend
$2 = 512
Comment 20 Karsten Kuenne 2008-10-08 01:10:02 UTC
Looks like Ubuntu has the same bug (#271423). But no solution there either. 
Comment 21 Federico Vecchiarelli 2008-10-13 04:23:52 UTC
For me nscd is particulary important because I'm using it for offline LDAP authentication. So far I'm using a watchdog to restart it when it dies. Opensuse 11.0 x64.
Comment 22 Bernd Nies 2008-10-21 09:19:36 UTC
We're having the same problem. OpenSUSE 11.0 (i386 and x86_64) with LDAP for authentication and NFS automount tables. It dies frequently within less than an hour. 

As result one cannot start Thunderbird (segfaults) and VMware Worstation 6.0.5 (freezes) as LDAP user after nscd has died. As local user it works. See also bug#157078 and http://bugs.gentoo.org/show_bug.cgi?id=223205.

The only workaround for us is so far a watchdog daemon that restarts nscd every time it crashes. I can provide you a strace of nscd the next time it crashes.

Best regards,
Bernd
Comment 23 Jon Nelson 2008-10-21 12:55:53 UTC
I gave up on nscd and have been using unscd - http://busybox.net/~vda/unscd/ - and it seems to work just great. Last time I checked it had been up for a month.

Comment 24 Jaroslaw Zachwieja 2008-10-22 12:36:53 UTC
Created SRPM, minimal testing on 11.0:

https://bugzilla.novell.com/show_bug.cgi?id=157078#c73
Comment 25 Achim Mildenberger 2008-10-30 09:40:17 UTC
Seems I ran into the same stability problem.
I have 30 PCs running openSuSE 11.0 on 64 bit.
I started logging of nscd now.

A fix to the problem would be very welcome.
Comment 26 Bernd Nies 2008-10-30 09:56:44 UTC
Our workaround is a watchdog daemon that restarts nscd:

==CUT==
watch_procs="/usr/sbin/nscd"
( while true; do
  for proc in $watch_procs; do
    if ! checkproc $proc; then
      logger -t watchdog "Restarting $proc."
      start_daemon $proc
    fi
  done
  sleep 60
done ) &
==CUT==

Nscd crashes up to eight times daily:

adnws001:~ # grep nscd /var/log/messages
Oct 27 06:16:19 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 27 08:46:21 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 27 12:46:24 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 27 22:31:36 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 28 04:46:40 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 28 09:36:43 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 28 11:01:44 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 28 13:05:49 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 28 16:34:35 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 28 16:36:36 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 28 17:41:02 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 28 22:40:04 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 29 04:47:06 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 29 11:02:09 adnws001 watchdog: Restarting /usr/sbin/nscd.
Oct 29 12:43:10 adnws001 watchdog: Restarting /usr/sbin/nscd. 

Comment 27 Bob Vickers 2008-10-30 10:37:32 UTC
Luxury! I used the watchdog approach under SuSE 10.2, but under SuSE 11.0 I see nscd crashing every few minutes, or I did before I disabled it.

Comment 28 Bob Vickers 2008-10-31 12:15:03 UTC
Following the suggestion in https://bugzilla.novell.com/show_bug.cgi?id=387202#c23
and also a private email from Jaroslaw I have installed unscd on a couple of machines. So far it is looking good and if I don't find any problems I'll roll it out to other machines.

I would be interested in hearing a comment from SuSE about unscd as it sounds like it could be the solution to a major headache, but SuSE are much better qualified than I to make that judgement. The drawback is that very few people seem to have tested it so far, and it is a very important piece of software that has to work in a wide variety of environments.

On the other hand, the standard nscd has been notoriously flakey for many years, so the bar isn't very high! 
Comment 29 Achim Mildenberger 2008-10-31 13:08:07 UTC
I employ now the watchdog-approach from Comment #26. 
Many Thanks for the code snippet!

Just for statistical fun on Halloween: 
The avarage lifetime of nscd here is 144 minutes/SuSE-11.0-box.
(average of crashes on 34 moderately loaded boxes during 14 hours).
(Using NIS and DNS, openSuSE 11.0, x86-64, no ldap).
Funny enough only in 11 (of 197) crashes there is a log from
the kernel in syslog (mostly segfaults, sometimes "general protection").
Comment 30 Jaroslaw Zachwieja 2008-10-31 13:32:35 UTC
Watchdogs are a spawn of evil. I've rolled out unscd on all 250 desktops now.

Fingers crossed (but still keeping the watchdog alive so I can at least catch any potential issues with unscd).

Bob, did you disable debugging already? How's performance?
Comment 31 Petr Baudis 2008-11-03 12:51:48 UTC
FWIW we are very strongly considering unscd for post-11.1, though nothing is decided yet.
Comment 32 Bob Vickers 2008-11-04 16:39:57 UTC
I am running unscd on several heavily-loaded 11.0 machines now and so far it is looking very good. It hasn't crashed, and every so often I run getent on every account to confirm  it is telling the truth (in the past nscd has suffered from corrupt caches as well as segmentation faults).
Comment 33 Jon Nelson 2008-11-04 18:26:07 UTC
*If* you are going to try unscd *and* you are using apparmor, you'll need to edit 

/etc/apparmor.d/usr.sbin.nscd

and right after 

  capability net_bind_service,

add:

  capability setgid,
  capability setuid,

for it to work.


Comment 34 Petr Baudis 2008-11-14 10:18:32 UTC
*** Bug 439210 has been marked as a duplicate of this bug. ***
Comment 35 Petr Baudis 2008-11-14 10:20:51 UTC
I'm planning to release packages at http://www.suse.de/~pbaudis/bug-387202
(will mirror out in an hour or two) as maintenance update for 11.0 in a short
while.
Comment 44 Petr Baudis 2008-11-19 18:57:43 UTC
nscd still crashes with this patch and in 11.1 - very seldom for me, much more frequently for others. So I will hold this a little more and try to fix that crash too.
Comment 45 Petr Baudis 2008-11-19 19:04:02 UTC
*** Bug 426396 has been marked as a duplicate of this bug. ***
Comment 46 Petr Baudis 2008-11-19 19:24:29 UTC
*** Bug 417865 has been marked as a duplicate of this bug. ***
Comment 47 Petr Baudis 2008-11-19 19:25:48 UTC
*** Bug 426679 has been marked as a duplicate of this bug. ***
Comment 48 Petr Baudis 2008-12-04 18:19:33 UTC
(Status update: In bug 446233, we have tested the patch to fix this issue and fixed another race condition which kicks in if nscd does not crash because of this one. Some people still report occasional crashes, but I don't have enough data to debug these. I will wait probably until next Tuesday and proceed to submit 11.0 update with all the nscd fixes we have by then, and some small extras.)
Comment 49 Petr Baudis 2008-12-04 18:19:50 UTC
*** Bug 157078 has been marked as a duplicate of this bug. ***
Comment 50 Petr Baudis 2008-12-04 18:31:33 UTC
*** Bug 374990 has been marked as a duplicate of this bug. ***
Comment 51 Walter Haidinger 2008-12-05 07:32:47 UTC
Based on the severity and priority of this bug:
How about adding the watchdog workaround as a patch into the current nscd package and release it as an update?

That is, modify nscd to start a master process which monitors its child(ren), restart them automatically upon death (maybe with a log entry) and have it kill them upon exit. This patch should neither be that much to add nor too difficult.

Of course this is not a real fix and quite ugly, IMHO. However, it would be a _quick_ workaround  for all 11.0 installations to make nscd more stable (from the users point of view), relieving the users from implementing a watchdog themselves. It would also buy some time until the real bug is found and squashed.
 
Comment 52 Bob Vickers 2008-12-05 10:36:57 UTC
Just to add that the unscd solution continues to work well for me. It has been running on a number of heavily loaded machines for over a month now without ever crashing, and there has been no sign of bad data.

It is good to know progress is being made on the standard nscd as well.
Comment 53 Achim Mildenberger 2008-12-05 11:48:56 UTC
I also switched to unscd about 4 weeks ago on a pool of 34 machines. I haven't encountered any problem since.

Comment 54 Petr Baudis 2008-12-06 14:08:22 UTC
Walter: We already do have such a watchdog, it's called "init". Just adding nscd -d to /etc/inittab should work fine. :-)
Comment 55 Walter Haidinger 2008-12-07 12:48:38 UTC
I see. So, I guess OpenSUSE 12.0 will scrap all those useless scripts in /etc/init.d and start everything from /etc/inittab, right? Nice.

Maybe I need to clarify this: 
comment #51 _was_ meant seriously, no joke intended!
Comment 56 Bob Vickers 2008-12-08 09:33:15 UTC
Created attachment 258547 [details]
sample watchdog script

In case it is useful, here is my version of the watchdog script, designed to be run as a cron job. It has a couple of good features:
(1) can check other services, not just nscd
(2) uses chkconfig to make sure it only checks services that are meant to be running
Comment 57 Walter Haidinger 2008-12-08 13:00:21 UTC
Nice script but because of comment #54 quite obsolete, isn't it? :-\

No, seriously, if such a wrapper would be implemented in nscd itself _all_ SUSE users would benefit, even those not capable of writing a wrapper themselves or even those not be able to find (say being aware of) this bugzilla entry.

Again, this could be quickly released as an nscd update until the real fix
is done (which we're waiting for since when? two years?). 

The required patch would only need to do the following (in pseudo-code):

/* signal handler to kill spawned nscd child */
signal(SIGTERM, { kill(child_pid); } );

/* core loop to (re)spawn nscd child */
for (;;) {
   child_pid = fork();
   if (child_pid == 0) {
     nscd_main();  /* run nscd main() */
   } else {
     wait();       /* wait for nscd child to exit/die */
     log("restarting nscd child");
   }
}

Would that be too difficult?
Comment 58 Bob Vickers 2008-12-08 13:31:49 UTC
Nice idea but dangerous: imagine some condition that caused nscd to fail as soon as it started. Then the nscd parent would whirl round burning up CPU and your system would be much more messed up than if nscd just died.
Comment 59 Walter Haidinger 2008-12-08 13:56:21 UTC
Then add a sleep() to wait a couple of seconds after each wait() to throttle respawning. This should be usually good practice in watchdog wrappers anyways, so I left it out. I said it's only pseudo-code. Moreover, logging will make you notice the problem.
Comment 60 Pedro Oliveira 2008-12-09 12:23:41 UTC
Hi!
I've switched to unscd too and it rocks, i'm using it in 32 and 64 bit environments, with a few servers and a with my lap. Never had a problem with it.

with regular nscd well i have 2 simple scripts, to make it restart without much hassle:

here they are:

MartiniMan-lap:~/bin # cat nscd_check
#!/bin/bash
while true ; do
        if [ ! "`pidof nscd`" ] ; then
                echo "`date +%d:%m:%y-%H:%M:%S` restarting nscd"
                sudo rcnscd restart ;
        fi
 sleep 1 ;
done


----------------------------------------------------------------------------------------------------------
Pedro Oliveira                            
IT Consultant                             
Email: pmsoliveira@gmail.com  
URL:   http://pedro.linux-geex.com                
Telefone: +351 96 5867227
----------------------------------------------------------------------------------------------------------
Comment 61 Pedro Oliveira 2008-12-09 13:00:08 UTC
Sorry, I forgot the second scrip to make the previous one start automatically from RC.
Just create this executable file: /etc/init.d/nscd_check


#!/bin/sh                                                                                                                                                                                   
### BEGIN INIT INFO                                                                                                                                                                         
# Provides:          nscd_check                                                                                                                                                             
# Required-Start:    nscd
# Should-Start:
# Required-Stop:
# Should-Stop:
# Default-Start:     3 5
# Default-Stop:      0 1 2 6
# Short-Description: check for nscd
# Description: check if nscd is running and restarts it if not
### END INIT INFO
#

. /etc/rc.status

# Reset status of this service
rc_reset

case "$1" in
    start)
        echo -n "Starting nscd_check"
        nohup /sbin/nscd_check >> /var/log/messages &
        rc_status -v
        ;;
    stop)
        echo -n "Shutting down nscd_check"
        pkill nscd_check
        rc_status -v
        ;;
    restart)
        $0 stop
        $0 start
        rc_status
        ;;
    *)
        echo "Usage: $0 {start|stop}"
        exit 1
        ;;
esac
rc_exit
#####################################

after this just type: insserv nscd_check

hope it helps.

----------------------------------------------------------------------------------------------------------
Pedro Oliveira                            
IT Consultant                             
Email: pmsoliveira@gmail.com  
URL:   http://pedro.linux-geex.com                
Telefone: +351 96 5867227
----------------------------------------------------------------------------------------------------------

Comment 62 Egbert König 2008-12-24 14:59:15 UTC
nscd 2.9, as shipped with openSuSE 11.1, crashes too. I am using unscd now. Wouldn't it be reasonable to provide unscd as a patch for openSuSE 11.0 and 11.1?
Comment 63 Jon Nelson 2008-12-29 17:03:38 UTC
nscd remains crashy for me, too (opensuse 11.1)
/me back to using unscd.
Comment 64 Swamp Workflow Management 2009-01-01 17:04:52 UTC
Update released for: glibc, glibc-devel, glibc-html, glibc-i18ndata, glibc-info, glibc-locale, glibc-obsolete, glibc-profile, nscd
Products:
openSUSE 11.0 (debug, i386, i686, ppc, ppc64, x86_64)
Comment 65 Bob Vickers 2009-01-05 13:51:18 UTC
nscd still crashes every hour or so after updating to nscd-2.8-14.2 on SuSE 11.0.

I will reinstate unscd.
Comment 66 Petr Baudis 2009-01-05 16:46:53 UTC
If nscd still crashes for you, please:

(i) Set persistent to 0 for all databases in your /etc/nscd.conf
(ii) /etc/init.d/nscd stop and run ulimit -c unlimited; nscd -d
(iii) When nscd crashes, please post a core here, compress it if it is larger than 1M or so.
(iv) Also post your /etc/nsswitch.conf with the core.

Without this information, I cannot fix any crashes; nscd on 11.0 crashed only once for me so far after this fix, and I don't have quite enough data to debug it yet, it seems.

Thanks!
Comment 67 Petr Baudis 2009-01-05 16:55:17 UTC
Egbert König: I plan to package unscd nicely in buildservice in the future, I'm just not sure when will I get to it.

To clarify, there are two bugs: bug 387202 against 11.0 and bug 446233 against 11.1. Since nscd is basically the same in 11.0 and 11.1 by now and I will continue to keep them in sync, I'm going to mark 446233 dupe of this one and bump this one to 11.1; further nscd updates will be released for both 11.0 and 11.1.

Both of these bugs are in fact many different bugs in nscd, (un)fortunately the unfixed ones trigger only rarely so they aren't as easy to debug.
Comment 68 Petr Baudis 2009-01-05 16:55:26 UTC
*** Bug 446233 has been marked as a duplicate of this bug. ***
Comment 69 Jon Nelson 2009-01-05 17:24:45 UTC
If anybody cares, I *have* packaged it (although the packaging needs some work) by using bits from the nscd package.

home:jnelson-suse if you like.

I *have* seen unscd crash, but not the latest version (0.36), which has been very slightly patched to unlink the pidfile and sockets.

I actively solicit improvements.

I'M NOT RESPONSIBLE FOR ANYTHING THAT GOES WRONG.
Comment 70 Petr Baudis 2009-01-05 17:40:11 UTC
Sorry, of course I forgot to mention that - I'm using your work as a base for mine. :)
Comment 71 Bob Vickers 2009-01-06 11:51:45 UTC
(In reply to comment #66 from Petr Baudis)
> If nscd still crashes for you, please:
> 
> (i) Set persistent to 0 for all databases in your /etc/nscd.conf
> (ii) /etc/init.d/nscd stop and run ulimit -c unlimited; nscd -d
> (iii) When nscd crashes, please post a core here, compress it if it is larger
> than 1M or so.
> (iv) Also post your /etc/nsswitch.conf with the core.
> 
> Without this information, I cannot fix any crashes; nscd on 11.0 crashed only
> once for me so far after this fix, and I don't have quite enough data to debug
> it yet, it seems.
> 
> Thanks!
> 

I managed to get another crash, and will attach the requested info. nscd is version 2.8-14.2, running on SuSE 11.0.
Comment 72 Bob Vickers 2009-01-06 11:58:08 UTC
Created attachment 263361 [details]
nscd core file plus log messages and config files
Comment 73 Bernd Nies 2009-01-07 09:57:20 UTC
Hi,

Some good news while everybody is complaining: I installed all Suse 11.0 updates with "zypper update" and rebooted system two days ago and since then nscd keeps running. Before that it crashed every few hours and was restarted with my watchdog daemon.

adnws001:~ # rpm -qa | egrep 'nscd|glibc'
libnscd-2.0.2-81.1
nscd-2.8-14.2
glibc-2.8-14.2
glibc-locale-2.8-14.2
glibc-devel-2.8-14.2
glibc-info-2.8-14.2

adnws001:~ # uname -a
Linux adnws001 2.6.25.18-0.2-pae #1 SMP 2008-10-21 16:30:26 +0200 i686 i686 i386 GNU/Linux

Thanks a lot!
Bye,
Bernd
Comment 74 Michal Marek 2009-01-07 13:42:32 UTC
FWIW, the 11.0 update package works for me so far.
Comment 75 Petr Baudis 2009-01-21 00:35:02 UTC
*** Bug 467393 has been marked as a duplicate of this bug. ***
Comment 76 Hans-Peter Jansen 2009-01-21 22:07:29 UTC
FWIW, another variant, this time with 11.1 (nscd-2.9-2.8):

10765: provide access to FD 12, for hosts
10765: Reloading "die-offenbachs.homelinux.org" in hosts cache!
10765: Reloading "0" in group cache!
10765: Reloading "2222" in group cache!
10765: remove GETHOSTBYNAME entry "localhost"
10765: remove GETPWBYUID entry "51"
10765: remove GETPWBYNAME entry "nobody"
10765: remove GETPWBYUID entry "65534"
10765: remove GETPWBYNAME entry "postfix"
nscd: mem.c:412: gc: Zusicherung »next_data < &he_data[db->head->nentries]« nicht erfüllt.
Abgebrochen

When nscd crashed, amarok takes ages to start up (say 5-10 minutes!), with nscd it takes 2-5 secs.

Now, that B O'B will setup a new world order, these bugs really cry for immediate fixes, Petr.
Comment 77 Petr Baudis 2009-01-21 22:57:13 UTC
Actually, I have just prepared a new round of nscd updates for 11.0 and 11.1, at

http://www.suse.de/~pbaudis/bug-387202-2/

I'm sorry to those who I told before 11.1 and 11.0 nscd is identical, it turns out that the 11.1 glibc update I prepared last did not actually make it to 11.1. :-( So 11.0 should actually have much more stable nscd than 11.1 now. I will try to trigger another round of updates now.
Comment 80 Swamp Workflow Management 2009-01-22 22:02:38 UTC
The SWAMPID for this issue is 22192.
Please submit the patch and patchinfo file using this ID.
(https://swamp.suse.de/webswamp/wf/22192)
Comment 81 Carlos Robinson 2009-01-24 00:54:26 UTC
Yet another watchdog.

root's crontab entry:

-0,*/5 * * * 1-7 /root/bin/watchdog_nscd > /dev/null

script:

#!/bin/bash
# watchdog para reiniciar el servicio nscd

# idea del case en "307:rc"
/usr/sbin/rcnscd status start; status=$?
echo "Status= "$status
case $status in
    [1-47])  echo "failed"
         /bin/logger -p user.warn -t watchdog \
            "nscd is not running, restarting. -- Bugzilla 387202; "\
            "see root's crontab to disable this wd"
            /usr/sbin/rcnscd restart
         ;;
    [56])   echo "skipped"
         ;;
    0|*) echo "Nothing to do"
         ;;
esac



I believe you should create some kind of watchdog and push it via YOU to systems, till this problem is really solved.
Comment 82 Carlos Robinson 2009-01-24 09:55:58 UTC
Sorry, errata in #81

"/usr/sbin/rcnscd status start" should be "/usr/sbin/rcnscd status", of course. It's of no consequence, anyway.
Comment 83 Hans-Peter Jansen 2009-01-25 21:38:00 UTC
Petr, for what it worth, since I installed http://www.suse.de/~pbaudis/bug-387202-2/
nscd didn't crashed.

A yast compatible repo structure for this dir would ease testing greatly, though. (Well, I use createrepo internally..).
Comment 84 Hans-Peter Jansen 2009-01-28 10:18:10 UTC
Cheered too soon :-(.

Crashed after three days, but I stopped the nscd debugging before my last post.

Will set it up again now.
Comment 85 Swamp Workflow Management 2009-02-02 11:16:48 UTC
Update released for: nscd
Products:
openSUSE 11.0 (i386, ppc, x86_64)
Comment 86 Carlos Robinson 2009-02-03 19:06:59 UTC
I got what I think is that update:

cer@nimrodel:~> rpm -q -i nscd
Name        : nscd                         Relocations: (not relocatable)
Version     : 2.8                               Vendor: SUSE LINUX Products GmbH, Nuernberg, Germany
Release     : 14.4                          Build Date: Sun 25 Jan 2009 10:06:27 PM CET
Install Date: Tue 03 Feb 2009 04:21:08 AM CET      Build Host: stravinsky.suse.de


I had nscd crash twice today - ie, after the update:

Feb  3 15:55:01 nimrodel watchdog: nscd is not running, restarting. -- 
Feb  3 17:55:01 nimrodel watchdog: nscd is not running, restarting. -- 
Feb  3 17:55:01 nimrodel nscd: 12295 invalid persistent database file "/var/run/nscd/passwd": verification failed

admittedly, it is crashing less.
Comment 87 Petr Baudis 2009-02-03 19:14:00 UTC
Carlos, can you please follow the reporting guidelines I outlined in comment 66? Thank you.
Comment 88 Carlos Robinson 2009-02-03 19:43:24 UTC
(In reply to comment #87)
> Carlos, can you please follow the reporting guidelines I outlined in comment
> 66? Thank you.

Let me see...

> (i) Set persistent to 0 for all databases in your /etc/nscd.conf

Huh? I have:

nimrodel:~ # grep -i persistent  /etc/nscd.conf
#       persistent              <service> <yes|no>
        persistent              passwd          yes
        persistent              group           yes
        persistent              hosts           no
        persistent              services        yes

What exactly do I edit? My configuration is your default supplied config, I think.

> (ii) /etc/init.d/nscd stop and run ulimit -c unlimited; nscd -d

Done. I now have in the startup script this:

case "$1" in
    start)
        echo -n "Starting Name Service Cache Daemon"
        #/sbin/startproc -p $NSCD_PID $NSCD_BIN
        # Bug 387202#c66
        ulimit -c unlimited
        /sbin/startproc -p $NSCD_PID $NSCD_BIN -d
        rc_status -v
        ;;

If this is not adequate, please tell me how I change the script - it has to be that way, I have a watchdog restarting the daemon automatically.
[...]
No, it is not adequate, status says "unused". Undoing the "-d" till you expand the instructions.
Comment 89 Roland Bernet 2009-02-03 19:52:59 UTC
Hi Petr,
Tried several times with
  ulimit -c unlimited; nscd -d
and nscd does crash, but I never get a core dump ...

Tried with a small script dividing by 0 and it writes a core.
Any ideas how to get a core dump of a nscd crash?
Comment 90 Hans-Peter Jansen 2009-02-03 23:20:31 UTC
I wasn't able to get the version from #77 crash - as long as I run it in debug mode - unlike running it as an ordinary runlevel service. 

Since the debug mode prevents nscd from forking, maybe some fork or clone related race condition in nscd is the real McCoy in this issue.

Roland, Carlos please keep the ulimit -c unlimited; nscd -d running in a terminal, and be sure, that rcnscd is not running.
Comment 91 Carlos Robinson 2009-02-04 02:01:35 UTC
(In reply to comment #90)

> Roland, Carlos please keep the ulimit -c unlimited; nscd -d running in a
> terminal, and be sure, that rcnscd is not running.

Well, I have done just that, but I still need clarification on "persistent" configuration, as per #88
Comment 92 Carlos Robinson 2009-02-04 03:48:00 UTC
Ok, nscd just crashed. Output in window was:

...
921:   GETFDGR
8921: provide access to FD 6, for group
8921: handle_request: request received (Version = 2) from PID 10892
8921:   GETFDGR
8921: provide access to FD 6, for group
8921: remove GETPWBYUID entry "51"
8921: remove GETPWBYNAME entry "nobody"
8921: remove GETPWBYUID entry "65534"
8921: remove GETPWBYNAME entry "postfix"
nscd: mem.c:368: gc: Assertion `off_allocend <= db->head->first_free' failed.
nimrodel:~/Bugzilla/Bug_387202 # 

There is no core in that directory. Config:

cer@nimrodel:~> cat /etc/nscd.conf | egrep -v "^[[:space:]]*$|^#"
        debug-level             0
        paranoia                no
        enable-cache            passwd          yes
        positive-time-to-live   passwd          600
        negative-time-to-live   passwd          20
        suggested-size          passwd          211
        check-files             passwd          yes
        persistent              passwd          yes
        shared                  passwd          yes
        max-db-size             passwd          33554432
        auto-propagate          passwd          yes
        enable-cache            group           yes
        positive-time-to-live   group           3600
        negative-time-to-live   group           60
        suggested-size          group           211
        check-files             group           yes
        persistent              group           yes
        shared                  group           yes
        max-db-size             group           33554432
        auto-propagate          group           yes
        enable-cache            hosts           yes
        positive-time-to-live   hosts           600
        negative-time-to-live   hosts           0
        suggested-size          hosts           211
        check-files             hosts           yes
        persistent              hosts           no
        shared                  hosts           yes
        max-db-size             hosts           33554432
        enable-cache            services        yes
        positive-time-to-live   services        28800
        negative-time-to-live   services        20
        suggested-size          services        211
        check-files             services        yes
        persistent              services        yes
        shared                  services        yes
        max-db-size             services        33554432
cer@nimrodel:~>
Comment 93 Carlos Robinson 2009-02-04 03:53:21 UTC
I forgot:

cer@nimrodel:~> cat  /etc/nsswitch.conf | egrep -v "^[[:space:]]*$|^#"
passwd: compat
group:  compat
hosts:          files mdns4_minimal [NOTFOUND=return] dns
networks:       files dns
services:       files
protocols:      files
rpc:            files
ethers:         files
netmasks:       files
netgroup:       files nis
publickey:      files
bootparams:     files
automount:      files nis
aliases:        files
cer@nimrodel:~>

And to avoid confusions, I'm on 11.0
Comment 94 Carlos Robinson 2009-02-04 12:48:06 UTC
One more:

11415:  GETFDPW
11415: provide access to FD 4, for passwd
11415: Reloading "0.pool.ntp.org" in hosts cache!
11415: Reloading "1.ch.pool.ntp.org" in hosts cache!
11415: Reloading "0.es.pool.ntp.org" in hosts cache!
11415: Reloading "1.pool.ntp.org" in hosts cache!
11415: Reloading "2.pool.ntp.org" in hosts cache!
11415: Reloading "0.ch.pool.ntp.org" in hosts cache!
11415: Reloading "3.pool.ntp.org" in hosts cache!
11415: Reloading "0.uk.pool.ntp.org" in hosts cache!
11415: Reloading "users.opensuse.org" in hosts cache!
11415: Reloading "0.fr.pool.ntp.org" in hosts cache!
11415: remove GETAI entry "0.pool.ntp.org"
11415: remove GETAI entry "1.ch.pool.ntp.org"
11415: remove GETAI entry "0.es.pool.ntp.org"
11415: remove GETAI entry "1.pool.ntp.org"
11415: remove GETAI entry "2.pool.ntp.org"
11415: remove GETAI entry "0.ch.pool.ntp.org"
11415: remove GETAI entry "nimrodel"
11415: remove GETAI entry "3.pool.ntp.org"
11415: remove GETAI entry "0.uk.pool.ntp.org"
11415: remove GETAI entry "0.fr.pool.ntp.org"
11415: remove GETPWBYNAME entry "upsd"
11415: remove GETPWBYUID entry "115"
nscd: mem.c:477: gc: Assertion `next_hash == &he[db->head->nentries]' failed.
nimrodel:~/Bugzilla/Bug_387202 # 
nimrodel:~/Bugzilla/Bug_387202 # ulimit -c unlimited; nscd -d 
15414: invalid persistent database file "/var/run/nscd/passwd": verification failed
Comment 95 Carlos Robinson 2009-02-04 23:54:25 UTC
One more:

15414: handle_request: request received (Version = 2) from PID 17943
15414:  GETFDGR
15414: provide access to FD 6, for group
15414: remove GETPWBYUID entry "1000"
15414: remove GETPWBYNAME entry "cer"
15414: handle_request: request received (Version = 2) from PID 7657
15414:  GETAI (www.os-translation.com.ar)
15414: remove GETPWBYNAME entry "lp"
15414: remove GETPWBYUID entry "4"
nscd: mem.c:368: gc: Assertion `off_allocend <= db->head->first_free' failed.
Aborted

Another:

21408:  GETFDPW
21408: provide access to FD 4, for passwd
21408: handle_request: request received (Version = 2) from PID 1113
21408:  GETFDPW
21408: provide access to FD 4, for passwd
21408: remove GETPWBYUID entry "101"
21408: remove GETPWBYNAME entry "messagebus"
nscd: mem.c:368: gc: Assertion `off_allocend <= db->head->first_free' failed.
Aborted
Comment 96 Carlos Robinson 2009-02-05 11:53:25 UTC
Another one:

2138: Reloading "0.pool.ntp.org" in hosts cache!
2138: remove GETAI entry "0.pool.ntp.org"
2138: Reloading "1.ch.pool.ntp.org" in hosts cache!
2138: Reloading "0.es.pool.ntp.org" in hosts cache!
2138: Reloading "1.pool.ntp.org" in hosts cache!
2138: Reloading "2.pool.ntp.org" in hosts cache!
2138: Reloading "0.ch.pool.ntp.org" in hosts cache!
2138: Reloading "3.pool.ntp.org" in hosts cache!
2138: Reloading "0.uk.pool.ntp.org" in hosts cache!
2138: Reloading "0.fr.pool.ntp.org" in hosts cache!
2138: remove GETAI entry "1.ch.pool.ntp.org"
2138: remove GETAI entry "0.es.pool.ntp.org"
2138: remove GETAI entry "1.pool.ntp.org"
2138: remove GETAI entry "2.pool.ntp.org"
2138: remove GETAI entry "0.ch.pool.ntp.org"
2138: remove GETAI entry "3.pool.ntp.org"
2138: remove GETAI entry "0.uk.pool.ntp.org"
2138: remove GETAI entry "0.fr.pool.ntp.org"
2138: remove GETPWBYUID entry "51"
2138: remove GETPWBYNAME entry "nobody"
2138: remove GETPWBYUID entry "65534"
2138: remove GETPWBYNAME entry "postfix"
2138: remove GETPWBYNAME entry "upsd"
2138: remove GETPWBYUID entry "115"
Segmentation fault
nimrodel:~/Bugzilla/Bug_387202 # l
total 8
drwxr-xr-x  2 root root 4096 Feb  4 02:00 ./
drwxrwxr-x 33 cer  root 4096 Feb  4 01:58 ../
-rw-r--r--  1 root root    0 Feb  4 04:45 nscd.log


Feb  5 12:40:51 nimrodel kernel: nscd[2139]: segfault at fffffdc4 ip b7f8c822 sp adda5f5c error 4 in nscd[b7f7c000+1c000]


Well, as I see no comments on how to produce that core file, and it keeps crashing, I'm restarting the "normal" service with automatic watchdog restarting the service, instead of "nscd -d" in a terminal. I will not comment further unless I get feedback to the contrary, I see no point.
Comment 97 Hans-Peter Jansen 2009-02-05 12:28:09 UTC
> Well, as I see no comments on how to produce that core file, and it keeps
> crashing, I'm restarting the "normal" service with automatic watchdog
> restarting the service, instead of "nscd -d" in a terminal. I will not
> comment further unless I get feedback to the contrary, I see no point.

You truely have a point here, Carlos. FWIW, your crashes nicely sum up, what I see very sporadic here. Petr, I really wonder, why you don't provide the nscd debug packages (see #77). Otherwise, similar to Carlos, I see no point in running nscd via gdb...
Comment 98 Michael Matz 2009-02-05 14:18:33 UTC
The debuginfo packages are right there where Petr said in comment #77.
Maybe you are confused that there's no nscd-debug{info,source}?  That's because
debug packages don't exist for subpacks (and nscd is one of glibc),
you need to install glibc-debuginfo (-debugsource).

Also, Carlos: you didn't yet follow the guidelines of comment #66.  You still
use persistent databases.  Yes, the comment talks about setting it to "0".
Of course instead you should use "no" for all databases you have in nscd.conf.
See nscd.conf(5).

But indeed, more logs are not necessary I think, we see the assertions
that cause nscd to exit.  core file would be a bit more usefull.  They aren't
placed into the current pwd, but into the working dir of nscd, which usually
is '/' (nscd chdir's into that one as daemon).  You probably have some lying
around there.
Comment 99 Swamp Workflow Management 2009-02-05 14:44:46 UTC
Update released for: glibc, glibc-debuginfo, glibc-debugsource, glibc-devel, glibc-html, glibc-i18ndata, glibc-info, glibc-locale, glibc-obsolete, glibc-profile, nscd
Products:
openSUSE 11.1 (debug, i586, i686, ppc, ppc64, x86_64)
Comment 100 Carlos Robinson 2009-02-05 14:58:42 UTC
(In reply to comment #98)
> The debuginfo packages are right there where Petr said in comment #77.
> Maybe you are confused that there's no nscd-debug{info,source}?  That's because
> debug packages don't exist for subpacks (and nscd is one of glibc),
> you need to install glibc-debuginfo (-debugsource).

Well, if you want me to install something in order to produce the coredump, tell me what exactly do I install.

> 
> Also, Carlos: you didn't yet follow the guidelines of comment #66.  You still
> use persistent databases.  Yes, the comment talks about setting it to "0".
> Of course instead you should use "no" for all databases you have in nscd.conf.
> See nscd.conf(5).

No, I didn't, I said in #88 and #91 that I needed clarification. I still do. Do you mean I should use:

        persistent              whatever         no

> But indeed, more logs are not necessary I think, we see the assertions
> that cause nscd to exit.  core file would be a bit more usefull.  They aren't
> placed into the current pwd, but into the working dir of nscd, which usually
> is '/' (nscd chdir's into that one as daemon).  You probably have some lying
> around there.

One of the crashes was not an assertion but a segfault. No, there is no core on /. I can run an "updatedb; locate corewhatever", but I need to know the exact name to search for, because it yields up of 5713 entries. Or alternatively, an exact "find" command to find it.

As far as I can see, there is no core in:
/
/tmp
/var/run/nscd
/root/Bugzilla/Bug_387202    <-- pwd where I run "nscd -d"
/root
/home/cer

Note: the "nscd -d" command runs on an xterm where I did "su -" to root, in order to keep an eye on it. The xterm is under gnome. I remember some mention years ago of X blocking coredumps. But somebody here said he managed to produce coredumps with a code dividing by zero, so it must be nscd which impedes them :-?

Suggestion: search for all assertions in the code and replace/add logger calls. In my programming days, an assertion was the last resource to use, and never in production code. It was used instead of proper code to find an unexpected situation, never as error handling code.
Comment 101 Hans-Peter Jansen 2009-02-05 19:06:38 UTC
> The debuginfo packages are right there where Petr said in comment #77.
> Maybe you are confused that there's no nscd-debug{info,source}?

Yes.

> That's because
> debug packages don't exist for subpacks (and nscd is one of glibc),
> you need to install glibc-debuginfo (-debugsource).

Done that already. As noted before, it would be MUCH easier for every tester, if Petr would provide a zypp compatible repo structure over there: Then people could add the repo target, update and install additional packages.

> But indeed, more logs are not necessary I think, we see the assertions
> that cause nscd to exit.  core file would be a bit more usefull.  They aren't
> placed into the current pwd, but into the working dir of nscd, which usually
> is '/' (nscd chdir's into that one as daemon).  You probably have some lying
> around there.

I got a nscd segfault (a few days ago) too, assertions also, but no core:

> for f in $(locate core | egrep '\<core$'); do [ -f $f ] && l $f; done
lrwxrwxrwx 1 root root 11  7. Jan 22:20 /dev/core -> /proc/kcore
lrwxrwxrwx 1 root root 11 25. Dez 14:30 /lib/udev/devices/core -> /proc/kcore
-rw-r--r-- 1 root root 213  3. Dez 08:26 /var/adm/perl-modules/yast2-core

For whatever reason, something prevents the kernel from creating a nscd core file. Since two people see this behavior, I bet you won't see ANY cores from somebody as long as you cannot tell us how! Read: try to simulate an assertion or segfault with nscd, and get it to produce one. I bet again, that this fails also. Now find the reason, tell us, and we're back into the game.
Comment 102 Roland Bernet 2009-02-06 08:32:05 UTC
Created attachment 270688 [details]
nscd backtrace

nscd crashed again on my openSUSE 11.0 with nscd-2.8-14.4. 
Still no core dump, but this time a backtrace:

23851: provide access to FD 6, for group
23851: Reloading "20915" in group cache!
*** glibc detected *** nscd: corrupted double-linked list: 0xb7f8d6e0 ***
======= Backtrace: =========
/lib/libc.so.6[0xb7de3fc4]
/lib/libc.so.6[0xb7de4264]
.
.

full output in the attached tar.gz file. It also includes my
/etc/nscd.conf and /etc/nsswitch.conf files.

Hope it helps ...
Comment 103 Michael Matz 2009-02-06 14:15:11 UTC
re #101:  core files for multi-thread processes (which nscd is) aren't named "core", but rather "core.$PID", hence your egrep pattern won't find them.

> Read: try to simulate an assertion
> or segfault with nscd, and get it to produce one.

Easy:

% ulimit -c unlimited
% nscd -d &
[1] 28280
% kill -SEGV $!
[1]+  Segmentation fault      (core dumped) nscd -d
% pwd; ls -l core.28280
/
-rw------- 1 root root 42344448 2009-02-06 15:10 core.28280

without debug:
% nscd
% pidof nscd
28304
% kill -SEGV $(pidof nscd)
% ls -l core.28304
-rw-------   1 root root 42344448 2009-02-06 15:12 core.28304

% file core.28280 core.28304
core.28280: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'nscd -d'
core.28304: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'nscd'
Comment 104 Hans-Peter Jansen 2009-02-06 15:28:35 UTC
Michael, not here, unfortunately:

xrated:/# export LANG=C
xrated:/# cat /etc/SuSE-release 
openSUSE 11.1 (i586)
VERSION = 11.1
xrated:/# ulimit -c unlimited
xrated:/# nscd -d &
[1] 8139
xrated:/# kill -SEGV $!
xrated:/# pwd; ls -l core*
/
ls: cannot access core*: No such file or directory
[1]+  Segmentation fault      nscd -d

xrated:/# nscd
xrated:/# pidof nscd
8206
xrated:/# kill -SEGV $(pidof nscd)
xrated:/# ls -l core*
ls: cannot access core*: No such file or directory

xrated:/# uname -a
Linux xrated 2.6.27.7-9-pae #1 SMP 2008-12-04 18:10:04 +0100 i686 athlon i386 GNU/Linux

Do you remember any further details, which may prevent core dumping?
Comment 105 Michal Marek 2009-02-06 15:34:06 UTC
Does 
$ /sbin/sysctl -a | grep kernel.core

show any unusual settings? Like kernel.core_pattern with an absolute path?
Comment 106 Hans-Peter Jansen 2009-02-06 15:37:54 UTC
No.

xrated:/# /sbin/sysctl -a | grep kernel.core
kernel.core_uses_pid = 0
kernel.core_pattern = core
Comment 107 Michael Matz 2009-02-06 16:18:45 UTC
Is there enough space on '/'?  Also note that your segfault message doesn't 
include the "(core dumped)" string, so it's really not even attempting
to dump core.  Very strange.  What does 'ulimit -a' say in that very shell,
after doing 'ulimit -c unlimited' and the forced segfault in nscd?
Comment 108 Jon Nelson 2009-02-06 16:40:48 UTC
apparmor may be getting in the way here, too.
Comment 109 Hans-Peter Jansen 2009-02-06 20:57:15 UTC
Bingo, that was the missing hint:

xrated:/# rcapparmor stop
Unloading AppArmor profiles                                                                                  done
xrated:/# nscd -d &
[1] 18566
xrated:/# kill -SEGV $!
xrated:/# ls -l core*
-rw------- 1 root root 143011840 Feb  6 21:27 core.18566
[1]+  Segmentation fault      (core dumped) nscd -d

I've filed a bugzilla report about this sillyness:
https://bugzilla.novell.com/show_bug.cgi?id=473529
Would you please vote for it, thanks.

Now back to the 'real' problem, will keep nscd running in 'observation' mode.
Comment 110 Roland Bernet 2009-02-10 10:34:29 UTC
Created attachment 271484 [details]
nscd core, nscd.conf, nsswitch.conf

nscd core dump from a openSUSE 11.0 system.
I have in addition added to the tar file
  /etc/nscd.conf, /etc/nsswitch.conf and the standard output.
Comment 111 Hans-Peter Jansen 2009-02-12 21:00:42 UTC
Created attachment 272448 [details]
nscd core, nscd.conf, nsswitch.conf

Here's one, that seems new (11.1 updated):

1827: provide access to FD 12, for hosts
1827: handle_request: request received (Version = 2) from PID 30345
1827:   GETPWBYNAME (root)
1827: handle_request: request received (Version = 2) from PID 30345
1827:   GETPWBYNAME (root)
1827: handle_request: request received (Version = 2) from PID 30345
1827:   GETPWBYNAME (root)
1827: handle_request: request received (Version = 2) from PID 30345
1827:   GETPWBYNAME (root)
1827: handle_request: request received (Version = 2) from PID 30345
1827:   GETPWBYNAME (root)
1827: handle_request: request received (Version = 2) from PID 30346
1827:   GETFDGR
1827: provide access to FD 9, for group
1827: handle_request: request received (Version = 2) from PID 30377
1827:   GETFDHST
1827: provide access to FD 12, for hosts
1827: remove GETAI entry "xrated"
1827: remove GETHOSTBYADDR entry "127.0.0.2"
1827: Reloading "0" in password cache!
1827: remove INITGROUPS entry "root"
nscd: mem.c:477: gc: Assertion `next_hash == &he[db->head->nentries]' failed.
Aborted (core dumped)
Comment 114 Toni Harbaugh-Blackford 2009-02-27 11:59:49 UTC
This bug is marked NEEDINFO, but what info is needed?  I've lost track.
Comment 115 Petr Baudis 2009-03-24 11:40:39 UTC
Removing bogus NEEDINFO; there are some new crashes to look at, but I'm decreasing priority and severity since they seem to happen much more rarely.
Comment 116 Carlos Robinson 2009-03-24 15:04:41 UTC
(In reply to comment #115)
> Removing bogus NEEDINFO; there are some new crashes to look at, but I'm
> decreasing priority and severity since they seem to happen much more rarely.

It crashes several times per day here - see some of my last watchdog entries - I have seen it crash four times in an hour:


Mar 21 03:35:02 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 21 06:45:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 21 14:20:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 21 16:40:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 21 17:55:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 21 22:55:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 22 04:45:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 22 06:40:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 22 12:45:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 22 13:20:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 22 14:45:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 22 17:05:02 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 22 20:45:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 22 22:45:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 00:05:02 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 03:20:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 03:25:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 03:30:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 03:35:02 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 03:45:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 05:45:02 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 13:50:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 23 23:05:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd
Mar 24 02:20:01 nimrodel watchdog: nscd is not running, restarting. -- Bugzilla 387202; see root's crontab to disable this wd


You should clear all asserts from the C code: they are not logged to syslog, only to console. Not all crashes are segfaults - see the kernel log for the same period:


Mar 21 03:34:27 nimrodel kernel: nscd[3461]: segfault at bffe0178 ip b7e8e32e sp afe1d034 error 6 in libc-2.8.so[b7e20000+13d000]
Mar 21 22:51:12 nimrodel kernel: nscd[19438]: segfault at b8000012 ip b7f66825 sp addbee6c error 4 in nscd[b7f56000+1c000]
Mar 22 06:37:28 nimrodel kernel: nscd[31545]: segfault at bfffe44c ip b7fe4822 sp ade3ceec error 4 in nscd[b7fd4000+1c000]
Mar 22 20:41:34 nimrodel kernel: nscd[10540]: segfault at bfff66bc ip b80bc825 sp adf14f5c error 4 in nscd[b80ac000+1c000]
Mar 23 03:19:59 nimrodel kernel: nscd[17683]: segfault at fff1518c ip b7e42450 sp ade070b8 error 4 in libc-2.8.so[b7dd5000+13d000]
cer@nimrodel:~>
Comment 117 Petr Baudis 2009-03-25 19:49:02 UTC
Clearing asserts will just make nscd segfault few moments later, at a place that's even much harder to debug. :-(
Comment 118 Carlos Robinson 2009-03-25 23:16:06 UTC
Of course.

Clearing an assert doesn't mean comment it out, just handle the error condition cleanly.

Once an assert triggers, it means that a situation thought impossible by the programmer has in fact happened, and thus, the code has to be changed to avoid that situation happening.

On the other hand, an assert in such a daemon just kills the daemon silently, without any message to the user/admin. An assert is intended as a message from the dead to the creator of the program, so that the creator can reprogram the cylon. This is not happening. Those asserts are useless.

Instead the assert message should be sent to syslog with "warn" or "critical" level, and then the program halted - after logging the situation -. At best, the assert could be used to restart or reinit the daemon (the idea of dividing nscd into a parent and child is not so bad).
Comment 119 Petr Baudis 2009-03-25 23:24:07 UTC
Changing the code to avoid the situation happening is the hard part, unfortunately. ;-)

I agree that it would be nice if the assert() would be syslogged. I will try to make a patch when I finish the more urgent things on my hands. The asserts still certainly aren't useless, since a mere assert does not help anything anyway - you need to grab a core dump in order to really debug stuff.

95% of the crashes happen during database prune cycle; at this point, little but complete state reset can be done, and that's then pretty much equivalent to the watchdog solution.
Comment 120 Scott Lucas 2009-05-13 22:43:01 UTC
I was running opensuse 10.2 on a 150+ node cluster with no problems, but when we upgraded to opensuse 11.1, we started seeing all sorts of network problems. I isolated many of the issues down to nscd dying. After bumping up the debug level, I saw messages like this in /var/log/messages:

nscd[13710]: segfault at ffffff468bde0600 ip 00007f468aec3445 sp 00007f468080e7f8 error 4 in libc-2.9.so

I'm running NIS, so when this happens I immediately get:

do_ypcall: clnt_call: RPC: Unable to send; errno = Operation not permitted

on any nodes with nscd down. I'm running unscd as a replacement with much more success, but I wanted to submit this issue to the bug report as there doesn't appear to be a recent update. I also wanted to know if running unscd is the recommended work-around/fix for now, as well as in the future. Thanks.
Comment 121 Scott Lucas 2009-05-13 22:45:14 UTC
I was running opensuse 10.2 on a 150+ node cluster with no problems, but when we upgraded to opensuse 11.1, we started seeing all sorts of network problems. I isolated many of the issues down to nscd dying. After bumping up the debug level, I saw messages like this in /var/log/messages:

nscd[13710]: segfault at ffffff468bde0600 ip 00007f468aec3445 sp 00007f468080e7f8 error 4 in libc-2.9.so

I'm running NIS, so when this happens I immediately get:

do_ypcall: clnt_call: RPC: Unable to send; errno = Operation not permitted

on any nodes with nscd down. I'm running unscd as a replacement with much more success, but I wanted to submit this issue to the bug report as there doesn't appear to be a recent update. I also wanted to know if running unscd is the recommended work-around/fix for now, as well as in the future. Thanks.
Comment 122 Bob Vickers 2009-05-14 08:39:10 UTC
I can's speak for SuSE, and I recognise that nscd has to work in a
number of different environments. But my experience (in an LDAP site)
is that unscd has been a total success. Since November I have run it
on a mixture of 10.3, 11.0 and 11.1 with not a single crash. I also
run a monitor job which regularly checks the output of getent and unscd has
passed that test too.

In contrast every recent version of the SuSE nscd I have tried has
crashed frequently, sometimes as often as every few minutes. Also,
even more perniciously, it sometimes keeps running but gives wrong
information for some usernames.
Comment 123 Michael Matz 2009-05-14 11:22:49 UTC
FYI: We're considering using unscd for the upcoming releases, the instability
of nscd from upstream is a constant hassle, although we provided already
many improvements it's still a sad story.
Comment 124 Petr Baudis 2009-06-14 22:27:25 UTC
For those interested, we have backported another patch from mainline that reportedly makes nscd quite more stable; we will likely include this in future maintenance updates. If you need stable nscd in 11.1, please test packages that will be available at http://www.suse.de/~pbaudis/bug-505215/ - thanks!
Comment 125 Hans-Peter Jansen 2009-06-14 22:50:41 UTC
> If you need stable nscd in 11.1, please test packages that
> will be available at http://www.suse.de/~pbaudis/bug-505215/ - thanks!

Better try this:       http://www.suse.de/~pbaudis/bug-509398/

We will see what will be the outcome...
Comment 126 Toni Harbaugh-Blackford 2009-06-15 08:51:21 UTC
What are the details of bug 505215?
Comment 127 Petr Baudis 2009-06-15 12:27:29 UTC
That bug contained support request by a customer containing some core dumps and backtraces pretty much the same as the ones attached to this bug. The outcome of the support request was the patch that I've included in the test build above.
Comment 128 Hans-Peter Jansen 2009-06-18 08:02:18 UTC
Petr, after installing the packages from http://www.suse.de/~pbaudis/bug-505215/, I didn't got another crash here, running it for two days while the previous versions crash a few times per day. Unfortunately, there's an online update for 11.1 with glibc-2.10.1-3 and nscd-2.10.1-3 (note the higher version numbers!), which does NOT contain your latest fix, thus today those got installed (I've no idea, how to prevent this with zypper, yum had a nice exclude pattern regex per repo for such cases).

It would be nice to get yet another glibc/nscd update containing your fix soon..
Comment 129 Hans-Peter Jansen 2009-07-01 09:37:05 UTC
Hi Petr, guess what, I managed to miss the "downgrade", as noted in https://bugzilla.novell.com/show_bug.cgi?id=387202#c128 until this monday. 

But because I didn't harvested any nscd crashes in that time, something must be wrong :-[! Indeed, since monday and the "official" glibc/nscd update running, it crashes every few hours again.

I think that another glibs/nscd update is not only in order, it's crucial for any serious use of 11.1, please...
Comment 130 Petr Baudis 2009-07-01 09:48:45 UTC
I'm sorry that I don't have time to rebuild the package again with the correct revision number - I think SRPMs should be in that directory too so you should be able to do that yourself easily.

A maintenance update for SLE11/11.1 is already in the making.
Comment 131 Hans-Peter Jansen 2009-07-01 10:13:36 UTC
> I'm sorry that I don't have time to rebuild the package again with the
> correct revision number - I think SRPMs should be in that directory too so
> you should be able to do that yourself easily.

Of course, I can build that myself, but then I would have to distribute it to bunch of pretty wide spread systems, manually install, and get rid of rid, when the official release happens. And this precedure is further complicated by the "not so convenient" behavior of zypper (e.g. must use deprecated 'zypper dup' in order to change vendor..).

> A maintenance update for SLE11/11.1 is already in the making.

That's what I'm after. Cool. Hopefully it doesn't get delayed after the summer holiday season...
Comment 132 Roland Bernet 2009-07-01 10:27:08 UTC
Will there be also a fix for openSUSE 11.0?
Comment 133 Petr Baudis 2009-08-25 23:45:57 UTC
Unfortunately, that bugfix made another quite hard-to-track-down nscd bug show up, colliding with my vacation as well, so this did get delayed - hopefully I found a culprit of that one too (inverted boolean condition in glibc-2.3.5-nscd-zeronegtimeout.diff) and as soon as we get an ack that the issues are fixed, I'll push the button.

In 11.2, unscd is already the default caching daemon instead of nscd.

Unfortunately, 11.0 update is laborous since it does not share codebase (and testing) with our SLE11 product and thus it's unlikely it will be done, also since all 11.0 users that need nscd probably found a safer way to deal with the problems than receiving a poorly tested nscd update through the maintenance channel. I recommend to use unscd on 11.0 if you require a caching daemon.
Comment 134 Petr Baudis 2009-09-14 21:58:53 UTC
The update is in process of being released.
Comment 135 Carlos Robinson 2009-09-14 22:30:23 UTC
(In reply to comment #133)

> In 11.2, unscd is already the default caching daemon instead of nscd.

And it doesn't work. At least, in my 11.2 M7 it doesn't start on boot, and "rcnscd start" fails. No messages in syslog or anywhere I can see.


> Unfortunately, 11.0 update is laborous since it does not share codebase (and
> testing) with our SLE11 product and thus it's unlikely it will be done, also
> since all 11.0 users that need nscd probably found a safer way to deal with the
> problems than receiving a poorly tested nscd update through the maintenance
> channel. I recommend to use unscd on 11.0 if you require a caching daemon.

AFAIK, all users of 11.0 "need" nscd, as it is part of the default system, and some programs or daemons may complain if nscd is not running or is not installed (dependencies).
Comment 136 Jon Nelson 2009-09-15 02:03:26 UTC
The problem with unscd and 11.2 M7 (and 11.1 for that matter) is that unscd bumps up against the as-shipped apparmor profile - unscd uses setgroups and that appears to be a no-no. Either set nscd to report-only mode or shut off apparmor (which I do not recommend) until the profile can be repaired. 

Out of curiosity, why is the profile shipped by apparmor instead of nscd/unscd?

As far as 11.0 users "needing" nscd, that's just plain bogus - I'm not aware of any software that *requires* nscd to be running (or even installed), but I'm prepared to be enlightened.
Comment 137 Walter Haidinger 2009-09-15 06:44:39 UTC
According to bug #157078 (marked as a duplicate of this bug) you _need_ nscd for Thunderbird and nss_ldap under 11.0.

Should be fixed by providing a working nscd, see 
https://bugzilla.novell.com/show_bug.cgi?id=157078#c76
Comment 138 Petr Baudis 2009-09-15 06:57:28 UTC
Carlos, then please open a new bug for that; unless it's an apparmor problem, we have that covered by bug 535467 and I've just fixed that one yesterday (Jon: ...by moving the apparmor profile to the nscd package ;-).

Walter is right (fun fact: in 11.2+, nscd is also required to be running to work around some routers providing broken DNS services). You have convinced me :), I will prepare also 11.0 nscd update; I will need your cooperation for testing the update, though.
Comment 139 Jon Nelson 2009-09-15 12:03:35 UTC
When you say "in 11.2+, nscd is also required to be running to
work around some routers providing broken DNS services" - can you go into more detail? (By private mail if that is more appropriate). Personally, I use dnsmasq to proving DNS caching/validation/local resolution services because I have to deal with broken routers, too.

Pertinent to the upcoming 11.2 release:

unscd's manpage needs a healthy update (it documents none of the command line switches) and I can't make 'nscd -i' work without a second option. The invocation (usage) text needs a slight update (-i does not specify that a parameter is required) - ideally a parameterless -i invocation would use something like "all" and invalidate all of the caches...

That said, unscd is shaping up very nicely.

The problems with nss_ldap are many and scary - I can see nscd being required to solve issues with that library.


I'm very glad some nscd replacement is going to be present, and I'm overjoyed that it'll be installed by default in 11.2 - I suspect this will help out a great deal!
Comment 140 Petr Baudis 2009-09-16 11:37:51 UTC
11.2+ uses optimized glibc name resolution mechanism that looks up IPv4 and IPv6 addresses in parallel instead of sequentially - this confuses some cheap network routers, making each process doing the resolution time out once; in case nscd is running, this is not an issue.

If you have any problems with unscd, please open separate bugs for them - this bug is already huge and it's hard to track the unscd problems this way.
Comment 141 Petr Baudis 2009-11-16 19:34:31 UTC
11.0 test nscd package is now available at http://www.suse.de/~pbaudis/bug-387202/ (the url should start working in ~1hour) - could 11.0 users please test if it works well for them? We can then release it as a maintenance update.
Comment 142 Petr Baudis 2010-01-19 02:07:38 UTC
It seems noone who cares about nscd is still using 11.0; fair enough, I will close this bug, we cannot release an untested maintenance update.
Comment 143 Carlos Robinson 2010-01-19 16:20:38 UTC
(In reply to comment #142)
> It seems noone who cares about nscd is still using 11.0; fair enough, I will
> close this bug, we cannot release an untested maintenance update.

Sorry. 

This bug did not have "needinfo" when I asked bugzilla to display "my" buglist, so I didn't notice it when I looked. Blame radar failure.

(and anyway, I have a watch daemon that restarts the dead nscd automatically).

I'm using both 11.0 and 11.2, so I will attempt testing your package this weekend at the latest.
Comment 144 Carlos Robinson 2010-01-19 23:54:00 UTC
I have downloaded your rpm. However, that rpm is version 2.8.4-14.4, and I already had installed that version, and as I commented on #135, it crashes.

I have installed your version, nonetheless, and I will report what happens.
[...]
Four hours later it is still running fine. I will report on the weekend if it does not crash, or earlier if it does. I'll attempt to leave the bugzilla as needinfo from myself.
Comment 145 Carlos Robinson 2010-01-24 17:39:07 UTC
(In reply to comment #144)
> I have downloaded your rpm. However, that rpm is version 2.8.4-14.4, and I
> already had installed that version, and as I commented on #135, it crashes.
> 
> I have installed your version, nonetheless, and I will report what happens.
> [...]
> Four hours later it is still running fine. I will report on the weekend if it
> does not crash, or earlier if it does. I'll attempt to leave the bugzilla as
> needinfo from myself.

cer@nimrodel:~> ps axu | grep nscd
root      7239  0.0  0.0 141508  1024 ?        Ssl  Jan19   0:07 /usr/sbin/nscd

And today is 24, so it hasn't crashed. True, I have hibernated the machine, it hasn't been running that many hours, but it is good news, it hasn't yet crashed.

Looks good!
Comment 146 Roland Bernet 2010-01-25 09:27:15 UTC
I have installed the new version of nscd too and have not seen any crashes on these machines in the last few days (usually I get around one crash a day).
Seems to be a great improvement.
(Sorry missed the checking of the new patch too.)
Comment 147 Petr Baudis 2010-01-25 13:17:15 UTC
Thank you all for testing!

Maintenance, we are getting positive feedback, ok to release the update? Can we have a SWAMPID?
Comment 148 Marcus Meissner 2010-01-25 17:08:00 UTC
I am undecided.... a large change which might be risky, but fixing some bugs for customers at least ... :/

I would however say yes ... +1
Comment 149 Michal Marek 2010-01-25 17:12:15 UTC
If I recall my 11.0 experience with nscd... The biggest risk would be that nscd starts working suddenly :). So +1 for an update.
Comment 150 Bob Vickers 2010-01-25 17:25:11 UTC
I suspect the people who need a working nscd in an LDAP environment gave up on the default one a long time ago and switched to unscd, so that's probably why you didn't get more feedback.
Comment 151 Swamp Workflow Management 2010-01-26 10:38:05 UTC
The SWAMPID for this issue is 30491.
Please submit the patch and patchinfo file using this ID.
(https://swamp.suse.de/webswamp/wf/30491)
Comment 152 Anja Stock 2010-01-26 10:41:35 UTC
convinced by security ;)
Comment 153 Petr Baudis 2010-01-27 01:19:21 UTC
Thanks, patchinfo and package submitted.
Comment 154 Swamp Workflow Management 2010-02-03 09:19:12 UTC
Update released for: nscd
Products:
openSUSE 11.0 (i386, ppc, x86_64)