Bugzilla – Bug 213249
rcnetwork {stop,restart} fails
Last modified: 2007-06-05 09:37:05 UTC
rcnetwork {stop,restart} fails. When running 'rcnetwork stop' on a system using NetworkManager the pid files are not being removed from /var/run/. This prevents 'rcnetwork start' to succeed afterwards. The script /etc/init.d/network did not change recently, the same applies to NetworkManager. Probably a bug in sysvinit (killproc). Adding Andreas and Seife to CC as they ran into the same issue.
Created attachment 101839 [details] rcnetwork logs
Now that was fun. Not. sysconfig recently introduced replacements for {check,kill}proc. Both are bash functions (part of sysconfig/network/scripts/functions). The checkproc function is broken as it returns 0 for a dead program with a stale PID file. cf. checkproc(8) "The exit codes without the option -k have the following LSB conform conditions:" 0 Program is running 1 No process but pid file found (...) Both NetworkManagerDispatcher and dhcdbd are not cleaning up their PID files on SIGTERM properly; however, NetworkManager itself does. I'm really having a hard time finding the one to blame for all the mess as the changelog for sysconfig does not even mention the introduction of those replacements for {check,kill}proc.
Created attachment 102219 [details] The fix.
Submitted to STABLE.
The package changelog is indeed missing this change from the svn changelog. (There is a convention that committers to the sysconfig update the file "package/sysconfig.changes" in svn as well, which is used when rolling the next tarball/rpm. So normally the change would have ended up in the package changelog.) I hadn't worked on sysconfig for one year, and didn't think of this. Sorry about that. The change and its rationale can be found here: http://svn.suse.de/viewcvs/sysconfig?rev=1500&view=rev Thank you for the fix, and please forgive the aggravated debugging.
What is the reason of those ``replacements'' for pidof(8), checkproc(8), and killproc(8)???
As far as I know the bug which was leading to this change was bug #55370. At least for systems using NetworkManager I am strongly for reverting this change.
-> maintainer sysconfig
This bug is definitively fixed! What you guys want to discuss is bug 55370. Werner, if you have a better solution for the nfs-pidof-problem, then let me know. In bug 55370.
The bash replacements are incomplete; at least they are not man enough to ensure that 'rcnetwork restart' work as before. By guess, 'rcnetwork restart' fails in 4 out of 10 runs. Please revert that change. The reporter of bug 55370 agrees on that.
-> Peter
Comment #12 refers to systems running with NetworkManager.
I am on vacation now. Can someone else please take care of the bug? Thanks. I suggest to 1) test if the /usr issue is fixed at all. As I wrote in the other bug, even though checkproc and killproc should work with it there may well be other parts of sysconfig which still would cause a hang. So far, I didn't see any test feedback. 2) unless we don't know that the /usr issue is fixed, the checkproc/killproc replacements are not worth much and we don't need to put up with other bugs that result from them. So I suggest again to simply revert the change, and be done with it. No need to have this blocker bug. If 1) works and we want handle disappeared nfs gracefully, it may be worth to debug the issue. rcnetwork restart worked reliably for me, I didn't saw any problems. But I didn't use NetworkManager, and I have no idea how it would depend on the way rcnetwork restart works internally. I don't know how it is integrated. BTW, I have talked to Werner about possible modifications in sysvinit's checkproc and killproc to handle the situation more gracefully. There may be a possibility because a stat() on file on a hanging nfs mount turns out to be interruptible at least. But we didn't come to a final conclusion about it yet.
I'll revert all except of pidof in ifup-dhcp. Is that reasonable, Peter?
The problem of dhcdbd is fixed. Have a look at Bug 222267 - "dhcdbd do not remove pid file" Now dhcdbd removes the pid file and rcnetwork stop works fine. I've made some expirements with rcnetwork restart. After inserting "sleep 2" between stop and start it works fine too. But I do not know if it is a/the solution?
Christian, it is fine to revert the change, with or without pidof replacement. Peter, if the dhcpbd fix resolves only a part of this bug, as you say, I'm not sure what remains. I'm not sure what to suggest here. Thanks for taking care!
Removed replacements for checkproc, klillproc, pidof. We now use a improved my_pidof only in ifup-dhcp.
Package for RC1 will be submitted by jg soon. You may test it if you like: /work/built/mbuild/hall-zoz-2
*** Bug 220499 has been marked as a duplicate of this bug. ***