Bugzilla – Bug 134899
rcnetwork start does not always wait for dhcpcd completion for mandatory devices
Last modified: 2007-06-05 09:37:08 UTC
Hello Our linux workstations running SuSE 10.0 do not always start correctly. In fact they do not always get an IP-Address from the dhcp server before they try to mount some NFS-shares. First investigations showed that the problem disappears if RUN_PARALLEL is set to false in /etc/sysconfig/boot. Further analysis showed that /etc/init.d/network first forks ifup for all allready existing interfaces, which in turn forkes dhcpcd. Ifup then waits (at least for some user defined seconds) for dhcpcd to get the interface configured (by calling ifup with -o dhcp). Up to this point everthing works fine. If on the other hand, the interface is not present at the moment /etc/init.d/network is forked, the /etc/init.d/network will handle it a (correctly) as a mandatory device. /etc/init.d/network now just waits for the hotplug system bringing up the device but DOES NOT wait for dhcpcd to configure it. And this is definitly a bug. /etc/init.d/network should not only wait for the physical presence of the interface but also for it's configuration. By not doing so the behavior of /etc/init.d/network is not deterministic and other init-scripts which rely on completion of /etc/init.d/network will fail (like NFS in the example above). Finally we have resolved the problem by replacint line 419 in /etc/init.d/network status -m $IFACE &>/dev/null with /sbin/ip -f inet addr show $IFACE | grep "inet" This way we really wait for dhcpcd interface configuration at not only presence of the interface. I guess that there are better ways of acheiving the same goal. Regards Stefan Fink PS: don't hesitate to contact me, I've spent quit a lot of time dissecting the problem. :-)
You are right, it hapens only while waiting for mandatory devices. But it does not happens always. There is another silly race condition. Please test this: change rcnetwork back to it's original state apply this patch to /sbin/ifup --- /sbin/ifup 2005-11-17 11:01:46.000000000 +0100 +++ /sbin/ifup 2005-11-25 09:34:15.000000000 +0100 @@ -1157,7 +1169,7 @@ done fi -if [ "$dhcpretcode" = $R_DHCP_BG ] ; then +if [ "$dhcpretcode" != $R_SUCCESS ] ; then exithook $R_DHCP_BG else if [ "$retcode" = 0 -a -n "$retcode_mtu" -a "$retcode_mtu" != 0 ] ; then This should fix the problem. Please tell me if it does.
It works but now the setup of other non-dhcp interfaces will not return with an exitcode 0. Typically this is the case for lo and when booting I get one of those ugly red "failed" on the bootscreen (netherless lo is setup correctly). Because we only need to return $R_DHCP_BG when calling ifstatus and only if the bootproto is dhcp I changed the condition in the if statement from: if [ "$dhcpretcode" != $R_SUCCESS ] ; then to: if [ "$SCRIPTNAME" = ifstatus -a "$BOOTPROTO" = dhcp -a "$dhcpretcode" != $R_SUCCESS ] ; then This way everthing seems to work when waiting for mandatory devices. I've counterchecked also with the original version which still got errors. There is just one detail which is not so nice. We now (same for my patch) only wait for WAIT_FOR_INTERFACES seconds for the interface to be hotplugged AND dhcp configured. In the case where the interface is allready present when rcnetwork is forked we wait DHCLIENT_WAIT_AT_BOOT seconds. I find it's not very consistent to wait for two different values for the two cases. At least we should wait for WAIT_FOR_INTERFACES + DHCLIENT_WAIT_AT_BOOT when waiting for mandatory dhcp devices. Stefan
Thanks for further investigation. I guess we just need to properly initialize dhcpretcode somewhere. Try this patch instead: --- /sbin/ifup.orig 2005-11-25 17:53:40.000000000 +0100 +++ /sbin/ifup 2005-11-25 17:56:13.000000000 +0100 @@ -826,6 +826,7 @@ # bringing up/down the interface (main part) # +dhcpretcode=$R_SUCCESS # switch type. If SKIP_MAIN_PART == skip, don't execute any section case "$BOOTPROTO$SKIP_MAIN_PART" in dhcp+autoip|DHCP+AUTOIP) Please try. To the timeouts: DHCLIENT_WAIT_AT_BOOT is just the time that ifup-dhcp waits before it sends dhcp client in background and returns with retval 12 (R_DHCP_BG). If ifup returned that then rcnetwork will not consider that as success. Instead it waits some additional seconds if this is a mandatory device. rcnetwork times out WAIT_FOR_INTERFACES seconds after it was started. So if the interface comes shortly before rcnetwork times out, then the time for dhcp might be very short. I will open another bug for this: bug 135569
Finally back at work and ready for testing. Your patch works fine (with mine removed of course). For me the problem seems resolved. The problem with the timeouts is minor (at least if you understand the usage of the two incriminated configuration variables). At least the bug showed me, that network initialisation is much more complex than one would think. :-)) Thanks for your help.
Fixed in SVN. Andreas, should i add this to the pending YOUpdate? I don't know if it happens on many machines.
Yes, add it to the pending YOUupdate if possible. Otherwise only for STABLE.
YOUpdate is finally submitted
This patch did now conflict with other fixes around STARTMODE=ifplugd. (See bug 140124 and duplicates). Therefore i removed (bad) return value magic at the very end of ifup. Instead ifup-dhcp checks if there is an ifup-process running on this interfaces and assumes that connecting is in progress.