Bug 134899

Summary: rcnetwork start does not always wait for dhcpcd completion for mandatory devices
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Stefan Fink <stefan.fink>
Component: NetworkAssignee: Christian Zoz <zoz>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: werner
Version: Final   
Target Milestone: ---   
Hardware: i686   
OS: SuSE Linux 10.0   
Whiteboard:
Found By: Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Stefan Fink 2005-11-22 13:48:50 UTC
Hello

Our linux workstations running SuSE 10.0 do not always start correctly. In fact they do not always get an IP-Address from the dhcp server before they try to mount some NFS-shares.

First investigations showed that the problem disappears if RUN_PARALLEL is set to false in /etc/sysconfig/boot.

Further analysis showed that /etc/init.d/network first forks ifup for all allready existing interfaces, which in turn forkes dhcpcd. Ifup then waits (at least for some user defined seconds) for dhcpcd to get the interface configured (by calling ifup with -o dhcp). Up to this point everthing works fine.

If on the other hand, the interface is not present at the moment /etc/init.d/network is forked, the /etc/init.d/network will handle it a (correctly) as a mandatory device. /etc/init.d/network now just waits for the hotplug system bringing up the device but DOES NOT wait for dhcpcd to configure it. And this is definitly a bug. /etc/init.d/network should not only wait for the physical presence of the interface but also for it's configuration. By not doing so the behavior of /etc/init.d/network is not deterministic and other init-scripts which rely on completion of /etc/init.d/network will fail (like NFS in the example above).

Finally we have resolved the problem by replacint line 419 in /etc/init.d/network

  status -m $IFACE &>/dev/null

with

  /sbin/ip -f inet addr show $IFACE | grep "inet"

This way we really wait for dhcpcd interface configuration at not only presence of the interface. 

I guess that there are better ways of acheiving the same goal.

Regards

Stefan Fink

PS: don't hesitate to contact me, I've spent quit a lot of time dissecting the problem. :-)
Comment 1 Christian Zoz 2005-11-25 09:53:12 UTC
You are right, it hapens only while waiting for mandatory devices. But it does not happens always. There is another silly race condition.

Please test this:
change rcnetwork back to it's original state
apply this patch to /sbin/ifup
--- /sbin/ifup  2005-11-17 11:01:46.000000000 +0100
+++ /sbin/ifup  2005-11-25 09:34:15.000000000 +0100
@@ -1157,7 +1169,7 @@
        done
 fi

-if [ "$dhcpretcode" = $R_DHCP_BG ] ; then
+if [ "$dhcpretcode" != $R_SUCCESS ] ; then
        exithook $R_DHCP_BG
 else
        if [ "$retcode" = 0 -a -n "$retcode_mtu" -a "$retcode_mtu" != 0 ] ; then

This should fix the problem. Please tell me if it does.
Comment 2 Stefan Fink 2005-11-25 13:32:52 UTC
It works but now the setup of other non-dhcp interfaces will not return with an exitcode 0. Typically this is the case for lo and when booting I get one of those ugly red "failed" on the bootscreen (netherless lo is setup correctly).

Because we only need to return $R_DHCP_BG when calling ifstatus and only if the bootproto is dhcp I changed the condition in the if statement from:

if [ "$dhcpretcode" != $R_SUCCESS ] ; then

to:

if [ "$SCRIPTNAME" = ifstatus -a "$BOOTPROTO" = dhcp -a "$dhcpretcode" != $R_SUCCESS ] ; then

This way everthing seems to work when waiting for mandatory devices. I've counterchecked also with the original version which still got errors.

There is just one detail which is not so nice. We now (same for my patch) only wait for WAIT_FOR_INTERFACES seconds for the interface to be hotplugged AND dhcp configured. In the case where the interface is allready present when rcnetwork is forked we wait DHCLIENT_WAIT_AT_BOOT seconds. I find it's not very consistent to wait for two different values for the two cases. At least we should wait for 

WAIT_FOR_INTERFACES + DHCLIENT_WAIT_AT_BOOT

when waiting for mandatory dhcp devices.

Stefan


 
Comment 3 Christian Zoz 2005-11-25 17:34:41 UTC
Thanks for further investigation. I guess we just need to properly initialize dhcpretcode somewhere. Try this patch instead:

--- /sbin/ifup.orig     2005-11-25 17:53:40.000000000 +0100
+++ /sbin/ifup  2005-11-25 17:56:13.000000000 +0100
@@ -826,6 +826,7 @@
 # bringing up/down the interface (main part)
 #
 
+dhcpretcode=$R_SUCCESS
 # switch type. If SKIP_MAIN_PART == skip, don't execute any section
 case "$BOOTPROTO$SKIP_MAIN_PART" in
        dhcp+autoip|DHCP+AUTOIP)

Please try.

To the timeouts:
DHCLIENT_WAIT_AT_BOOT is just the time that ifup-dhcp waits before it sends dhcp client in background and returns with retval 12 (R_DHCP_BG). If ifup returned that then rcnetwork will not consider that as success. Instead it waits some additional seconds if this is a mandatory device. rcnetwork times out WAIT_FOR_INTERFACES seconds after it was started.
So if the interface comes shortly before rcnetwork times out, then the time for dhcp might be very short. I will open another bug for this: bug 135569
Comment 4 Stefan Fink 2005-11-29 12:21:48 UTC
Finally back at work and ready for testing.

Your patch works fine (with mine removed of course).

For me the problem seems resolved. The problem with the timeouts is minor (at least if you understand the usage of the two incriminated configuration variables).

At least the bug showed me, that network initialisation is much more complex than one would think. :-))

Thanks for your help.
Comment 5 Christian Zoz 2005-11-30 11:13:52 UTC
Fixed in SVN.

Andreas, should i add this to the pending YOUpdate?
I don't know if it happens on many machines.
Comment 6 Andreas Jaeger 2005-11-30 12:20:48 UTC
Yes, add it to the pending YOUupdate if possible.

Otherwise only for STABLE.
Comment 7 Christian Zoz 2006-01-20 10:12:39 UTC
YOUpdate is finally submitted
Comment 8 Christian Zoz 2006-02-13 11:09:12 UTC
This patch did now conflict with other fixes around STARTMODE=ifplugd. (See bug 140124 and duplicates). Therefore i removed (bad) return value magic at the very end of ifup. Instead ifup-dhcp checks if there is an ifup-process running on this interfaces and assumes that connecting is in progress.