Bug 213249

Summary: rcnetwork {stop,restart} fails
Product: [openSUSE] openSUSE 10.2 Reporter: Timo Hoenig <thoenig>
Component: BasesystemAssignee: Christian Zoz <zoz>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Blocker    
Priority: P5 - None CC: aj, werner
Version: Beta 2 plus   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: rcnetwork logs
The fix.

Description Timo Hoenig 2006-10-18 09:11:41 UTC
rcnetwork {stop,restart} fails.

When running 'rcnetwork stop' on a system using NetworkManager the pid files are not being removed from /var/run/.  This prevents 'rcnetwork start' to succeed afterwards.

The script /etc/init.d/network did not change recently, the same applies to NetworkManager. Probably a bug in sysvinit (killproc).

Adding Andreas and Seife to CC as they ran into the same issue.
Comment 1 Timo Hoenig 2006-10-18 09:12:30 UTC
Created attachment 101839 [details]
rcnetwork logs
Comment 3 Timo Hoenig 2006-10-21 22:32:52 UTC
Now that was fun.  Not.

sysconfig recently introduced replacements for {check,kill}proc.  Both are bash functions (part of sysconfig/network/scripts/functions).  The checkproc function is broken as it returns 0 for a dead program with a stale PID file.

cf. checkproc(8)

"The exit codes without the option -k have the following LSB conform conditions:"
0    Program is running
1    No process but pid file found
(...)

Both NetworkManagerDispatcher and dhcdbd are not cleaning up their PID files on SIGTERM properly; however, NetworkManager itself does.

I'm really having a hard time finding the one to blame for all the mess as the changelog for sysconfig does not even mention the introduction of those replacements for {check,kill}proc.
Comment 4 Timo Hoenig 2006-10-21 22:33:31 UTC
Created attachment 102219 [details]
The fix.
Comment 6 Timo Hoenig 2006-10-21 22:36:53 UTC
Submitted to STABLE.
Comment 7 Peter Poeml 2006-10-23 06:11:29 UTC
The package changelog is indeed missing this change from the svn
changelog.

(There is a convention that committers to the sysconfig update the file
"package/sysconfig.changes" in svn as well, which is used when rolling
the next tarball/rpm. So normally the change would have ended up in the
package changelog.)

I hadn't worked on sysconfig for one year, and didn't think of this.
Sorry about that.

The change and its rationale can be found here:
http://svn.suse.de/viewcvs/sysconfig?rev=1500&view=rev

Thank you for the fix, and please forgive the aggravated debugging.
Comment 8 Dr. Werner Fink 2006-11-02 10:16:47 UTC
What is the reason of those ``replacements'' for pidof(8), checkproc(8),
and killproc(8)???
Comment 9 Timo Hoenig 2006-11-02 10:20:46 UTC
As far as I know the bug which was leading to this change was bug #55370.

At least for systems using NetworkManager I am strongly for reverting this change.
Comment 10 Timo Hoenig 2006-11-02 10:30:03 UTC
-> maintainer sysconfig
Comment 11 Christian Zoz 2006-11-02 11:24:26 UTC
This bug is definitively fixed!

What you guys want to discuss is bug 55370. Werner, if you have a better solution for the nfs-pidof-problem, then let me know. In bug 55370.
Comment 12 Timo Hoenig 2006-11-14 16:55:21 UTC
The bash replacements are incomplete; at least they are not man enough to ensure that 'rcnetwork restart' work as before.  By guess, 'rcnetwork restart' fails in 4 out of 10 runs.

Please revert that change.

The reporter of bug 55370 agrees on that.
Comment 13 Timo Hoenig 2006-11-14 16:55:55 UTC
-> Peter
Comment 14 Timo Hoenig 2006-11-14 16:56:32 UTC
Comment #12 refers to systems running with NetworkManager.
Comment 18 Peter Poeml 2006-11-17 12:07:48 UTC
I am on vacation now. Can someone else please take care of the bug?
Thanks.

I suggest to 
 1) test if the /usr issue is fixed at all. As I wrote in the other bug,
    even though checkproc and killproc should work with it there may well be
    other parts of sysconfig which still would cause a hang. So far, I
    didn't see any test feedback.
 2) unless we don't know that the /usr issue is fixed, the
    checkproc/killproc replacements are not worth much and we don't need to
    put up with other bugs that result from them. So I suggest again to
    simply revert the change, and be done with it. No need to have this
    blocker bug.

If 1) works and we want handle disappeared nfs gracefully, it may be
worth to debug the issue. rcnetwork restart worked reliably for me, I
didn't saw any problems. But I didn't use NetworkManager, and I have no
idea how it would depend on the way rcnetwork restart works internally.
I don't know how it is integrated.

BTW, I have talked to Werner about possible modifications in sysvinit's
checkproc and killproc to handle the situation more gracefully. There
may be a possibility because a stat() on file on a hanging nfs mount
turns out to be interruptible at least. But we didn't come to a final
conclusion about it yet.
Comment 19 Christian Zoz 2006-11-17 16:14:44 UTC
I'll revert all except of pidof in ifup-dhcp. Is that reasonable, Peter?
Comment 20 Peter Varkoly 2006-11-17 18:32:03 UTC
The problem of dhcdbd is fixed. 
Have a look at Bug 222267 - "dhcdbd do not remove pid file"
Now dhcdbd removes the pid file and rcnetwork stop works fine.
I've made some expirements with rcnetwork restart.
After inserting "sleep 2" between stop and start it works fine too.
But I do not know if it is a/the solution?
Comment 21 Peter Poeml 2006-11-18 09:51:02 UTC
Christian, it is fine to revert the change, with or without pidof
replacement. 

Peter, if the dhcpbd fix resolves only a part of this bug, as you
say, I'm not sure what remains. I'm not sure what to suggest here.

Thanks for taking care!
Comment 22 Christian Zoz 2006-11-20 11:43:46 UTC
Removed replacements for checkproc, klillproc, pidof.

We now use a improved my_pidof only in ifup-dhcp.
Comment 23 Christian Zoz 2006-11-20 12:27:02 UTC
Package for RC1 will be submitted by jg soon.

You may test it if you like: /work/built/mbuild/hall-zoz-2
Comment 24 Tambet Ingo 2006-11-20 14:42:28 UTC
*** Bug 220499 has been marked as a duplicate of this bug. ***