Bugzilla – Bug 697929
ipv6, rcnetwork returns before dad completed
Last modified: 2012-02-17 22:00:22 UTC
Hi, rcnetwork start returns success when ipv6 adresses are still in tentative mode (still unusable). This causes later init scripts to fail if they are starting servers which want to listen to specific configured IPs. (In my case named does not to start on boot.) Workaround: setting ip CONFFLAG nodad for such important IPs or disabling dad system wide (sysctl) or just add "sleep 5" at the end of rcnetwork start Currently Ive added that sleep to my network init script. Better solution would be to wait for ip address state permanent (success) or dadfailed (error).
This certainly explain also the pb of mounting nfs ipv6 shares too :-)
The probability to hit this race will be much greater if you have RUN_PARALLEL="yes" in /etc/sysconfig/boot which is unfortunately the default.
Created attachment 451318 [details] Patch to wait for link and ipv6 dad RPM package will be in http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/11.4:/Update:/Test/
Thanks for the report Ruediger! There is a optional link_wait script (in scripts/), but it basically allows to sleep, ... I've implemented to wait for both link & dad ok/failure -- does it solve the issues for you? Bruno, yes.
Thanks a lot Marius, looks like you had some work with it. I will review your patch and also test your rpm in practice. But this may take some time because can't reboot affected machines right now.
Yes, it is not trivial as we don't have a daemon, that would update the state later. FYI: it doesn't work properly - it needs far too much time to start each interface + wait (ok for single manual ifup, but not at boot) :-/ Especially when you have bridges -- like in my current setup: bond0<eth0,eth1> dhcp4 + dhcp6 vlan30 br30 (STP=on) vlan40 br40 (STP=on) vlan50 br50 (STP=on) eth2 static that is quite common today. I'm working to rewrite it and start all interfaces first and then wait in a loop (done) and correctly update the status (still working on it at the moment) when all is fine.
Marius, do you have your private sysconfig clone published somewhere? I'd like to follow your work on this.
Created attachment 452941 [details] Patch to wait for ipv6 dad in rcnetwork This patch is on top of the previous one. (In reply to comment #7) > Marius, do you have your private sysconfig clone published somewhere? You can access it at http://w3.suse.de/~mt/git/sysconfig.git/, branch 'opensuse-11.4-update-test' as soon as it has been mirrored to outside. > I'd like to follow your work on this. Well, it didn't made sense to publish the last week version as it broke network starts... The RPM package with the current version is in: http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/11.4:/Update:/Test/ Sorry, that I forgot to enable the publish flag for the repository before :-/ It works fine for me and I'm going to apply it to openSUSE:Factory now.
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/84820 Factory / sysconfig
Created attachment 452988 [details] A fix for inverted link return value test
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/84863 Factory / sysconfig https://build.opensuse.org/request/show/84864 Factory / sysconfig https://build.opensuse.org/request/show/84865 Factory / sysconfig
Mr Maintenance, it is fixed in factory -- are we going to provide update for 11.4?
The SWAMPID for this issue is 43720. This issue was rated as low. Please submit fixed packages until 2011-11-15. Also create a patchinfo file using this link: https://swamp.suse.de/webswamp/wf/43720
ok for openSUSE, please submit there. needinfo SLE maint team
(In reply to comment #15) > ok for openSUSE, please submit there. needinfo SLE maint team OK, thanks!
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/90746 11.4 / sysconfig
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/90820 11.3 / sysconfig
Update released for: sysconfig, sysconfig-debuginfo, sysconfig-debugsource Products: openSUSE 11.3 (debug, i586, x86_64) openSUSE 11.4 (debug, i586, x86_64)
Updates for 11.3 and 11.4 released. I'll close the bug.
Hi, Ruediger Meier I am testing the bug, but I knew a little about rcnetwork and ipv6. So how did you discovered the bug? and what steps to reproduce it? (In reply to comment #0) > Hi, > > rcnetwork start returns success when ipv6 adresses are still in tentative mode > (still unusable). > > This causes later init scripts to fail if they are starting servers which want > to listen to specific configured IPs. > (In my case named does not to start on boot.) > > > Workaround: > setting ip CONFFLAG nodad for such important IPs > or disabling dad system wide (sysctl) > or just add "sleep 5" at the end of rcnetwork start > > Currently Ive added that sleep to my network init script. > Better solution would be to wait for ip address state permanent (success) or > dadfailed (error).
(In reply to comment #22) > Hi, Ruediger Meier > > I am testing the bug, but I knew a little about rcnetwork and ipv6. > So how did you discovered the bug? and what steps to reproduce it? Sorry, I forgot to attach a test description. Here some hints how to reproduce it quite easy: First, manual steps: ip link set dev eth0 down ip addr flush dev eth0 -- ip link set dev eth0 up ip addr add 2001:DB8:abba::1/64 dev eth0 ip addr show Every IPv6 is initially visible as "tentative", e.g.: inet6 2001:DB8:abba::1/64 scope global tentative after few seconds, when the kernel finished DAD, this flag will either disappear (success) or will get the "dadfailed" or the "flags 08" addition (depends on ip route version). Set up a NIC with static IPv6 address, e.g.: STARTMODE=auto BOOTPROTO='static' IPADDR='2001:DB8:abba::1/64' then execute the following commands: rcnetwork stop -o boot rcnetwork start -o boot ; ip addr show # sleep 10 ; ip addr show When you see tentative (without flags 08 or dadfailed), you run into this problem here. The second (commented out) "ip a s" after 10 secs should usually not show the tentative flag any more. Now you can configure a service that makes use of the IP address [bind() to this IP address, usually via "Listen 2001:DB8:abba::1"] and this service may fail trying to use the tentative address when it gets started just after network. The fixed rcnetwork version may show it as: eth0 is up, but has tentative ipv6 address but then wait until the DAD finished.
Further, "rcnetwork status" and "ifstatus eth0" report extended status that could be: eth0 is up eth0 is up, but ipv6 duplicate address check failed eth0 is up, but has tentative ipv6 address eth0 is not up eth0 is dormant eth0 has no carrier The fix also adds two variables that can be used to tune the behaviour (for problematic NICs or to disable the check) by adding them to the per interface config files -- ifcfg-eth0 (of ifcfg-bond0, ...): ## Type: integer ## Default: 0 # # The number of seconds to wait for link to become useable / ready. # Default is 0, causing to not wait for a ready link (0), because link # detection can't be enabled in all cases (e.g. bridges without ports). # Please use per interface settings to enable it. # LINK_READY_WAIT=0 ## Type: integer ## Default: "" # # The number of seconds to wait for the end of IPv6 duplicate address # detection in ifup. # Default is to use WAIT_FOR_INTERFACES/2 seconds in normal ifup runs. # When ifup is called by /etc/init.d/network at boot time, the check # is done, but /etc/init.d/network waits WAIT_FOR_INTERFACES seconds # for all interfaces togerther. Set to 0 to disable it. # IPV6_DAD_WAIT=""
(In reply to comment #22) > Hi, Ruediger Meier > > I am testing the bug, but I knew a little about rcnetwork and ipv6. > So how did you discovered the bug? and what steps to reproduce it? Beside Marius' posts above how to "see" the tentative mode you may also want to see what could happen in practice and why we need to wait at all. Try $ ip addr add 2001:DB8:abba::1/64 dev eth0 $ ping6 -c1 -I 2001:DB8:abba::1 ::1 ping: bind icmp socket: Cannot assign requested address after some seconds the ping should work.
Ahm... testing something else I noticed, that the code waits very short. I've found a bug -- the current patch waits only 1/10 of the specified time :-(( diff --git a/scripts/functions b/scripts/functions index a12b91c..14d4104 100644 --- a/scripts/functions +++ b/scripts/functions @@ -196,7 +196,7 @@ link_ready_wait () local iface=$1 local -i wsecs=${2:-0} local -i uwait=25000 - local -i loops=$(((wsecs * 100000) / $uwait)) + local -i loops=$(((wsecs * 1000000) / $uwait)) local -i loop=0 ret=0 link_ready_check "$iface" ; ret=$? @@ -212,7 +212,7 @@ ipv6_addr_dad_wait() local iface=$1 local -i wsecs=${2:-0} local -i uwait=25000 - local -i loops=$(((wsecs * 100000) / $uwait)) + local -i loops=$(((wsecs * 1000000) / $uwait)) local -i loop=0 ret=0 ipv6_addr_dad_check "$iface" ; ret=$?
The SWAMPID for this issue is 44824. This issue was rated as moderate. Please submit fixed packages until 2012-01-19. Also create a patchinfo file using this link: https://swamp.suse.de/webswamp/wf/44824
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/99577 12.1 / sysconfig
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/99585 12.1 / sysconfig
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/99593 11.4 / sysconfig https://build.opensuse.org/request/show/99595 11.3 / sysconfig
request 99595 -> 11.3: bnc#697929,bnc#739338 request 99593 -> 11.4: bnc#697929,bnc#739338 request 99585 -> 12.1: bnc#739338,bnc#697929,bnc#733118,bnc#727771,bnc#734723
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/105293 Factory / sysconfig
This is an autogenerated message for OBS integration: This bug (697929) was mentioned in https://build.opensuse.org/request/show/105749 Evergreen:11.2 / sysconfig