Bug 697929

Summary: ipv6, rcnetwork returns before dad completed
Product: [openSUSE] openSUSE 11.4 Reporter: Ruediger Meier <sweet_f_a>
Component: NetworkAssignee: Marius Tomaschewski <mt>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: bruno, junguo.wang, maint-coord, meissner
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard: .
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Deadline: 2012-01-19   
Attachments: Patch to wait for link and ipv6 dad
Patch to wait for ipv6 dad in rcnetwork
A fix for inverted link return value test

Description Ruediger Meier 2011-06-03 12:28:01 UTC
Hi,

rcnetwork start returns success when ipv6 adresses are still in tentative mode (still unusable).

This causes later init scripts to fail if they are starting servers which want to listen to specific configured IPs.
(In my case named does not to start on boot.)


Workaround:
setting ip CONFFLAG nodad for such important IPs
or disabling dad system wide (sysctl)
or just add "sleep 5" at the end of rcnetwork start

Currently Ive added that sleep to my network init script.
Better solution would be to wait for ip address state permanent (success) or dadfailed (error).
Comment 1 Bruno Friedmann 2011-06-04 09:20:55 UTC
This certainly explain also the pb of mounting nfs ipv6 shares too :-)
Comment 2 Ruediger Meier 2011-06-04 09:40:13 UTC
The probability to hit this race will be much greater if you have
RUN_PARALLEL="yes"
in /etc/sysconfig/boot which is unfortunately the default.
Comment 3 Marius Tomaschewski 2011-09-16 13:24:19 UTC
Created attachment 451318 [details]
Patch to wait for link and ipv6 dad

RPM package will be in

http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/11.4:/Update:/Test/
Comment 4 Marius Tomaschewski 2011-09-16 13:28:58 UTC
Thanks for the report Ruediger!

There is a optional link_wait script (in scripts/), but it basically allows
to sleep, ...

I've implemented to wait for both link & dad ok/failure -- does it solve the
issues for you?

Bruno, yes.
Comment 5 Ruediger Meier 2011-09-20 09:39:23 UTC
Thanks a lot Marius, looks like you had some work with it.

I will review your patch and also test your rpm in practice. But this may take some time because can't reboot affected machines right now.
Comment 6 Marius Tomaschewski 2011-09-23 13:04:39 UTC
Yes, it is not trivial as we don't have a daemon, that would update
the state later.

FYI:
it doesn't work properly - it needs far too much time to start each
interface + wait (ok for single manual ifup, but not at boot) :-/

Especially when you have bridges -- like in my current setup:

  bond0<eth0,eth1>       dhcp4 + dhcp6
    vlan30
      br30 (STP=on)
    vlan40
      br40 (STP=on)
    vlan50
      br50 (STP=on)
  eth2                   static

that is quite common today.

I'm working to rewrite it and start all interfaces first and then wait
in a loop (done) and correctly update the status (still working on it
at the moment) when all is fine.
Comment 7 Ruediger Meier 2011-09-23 15:49:37 UTC
Marius, do you have your private sysconfig clone published somewhere?
I'd like to follow your work on this.
Comment 8 Marius Tomaschewski 2011-09-26 10:00:38 UTC
Created attachment 452941 [details]
Patch to wait for ipv6 dad in rcnetwork

This patch is on top of the previous one.

(In reply to comment #7)
> Marius, do you have your private sysconfig clone published somewhere?

You can access it at http://w3.suse.de/~mt/git/sysconfig.git/, branch
'opensuse-11.4-update-test' as soon as it has been mirrored to outside.

> I'd like to follow your work on this.

Well, it didn't made sense to publish the last week version as it broke
network starts...

The RPM package with the current version is in:

http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/11.4:/Update:/Test/

Sorry, that I forgot to enable the publish flag for the repository before :-/

It works fine for me and I'm going to apply it to openSUSE:Factory now.
Comment 9 Bernhard Wiedemann 2011-09-26 11:00:10 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/84820 Factory / sysconfig
Comment 10 Marius Tomaschewski 2011-09-26 12:48:06 UTC
Created attachment 452988 [details]
A fix for inverted link return value test
Comment 11 Bernhard Wiedemann 2011-09-26 13:00:13 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/84863 Factory / sysconfig
https://build.opensuse.org/request/show/84864 Factory / sysconfig
https://build.opensuse.org/request/show/84865 Factory / sysconfig
Comment 12 Marius Tomaschewski 2011-10-13 13:10:43 UTC
Mr Maintenance,

it is fixed in factory -- are we going to provide update for 11.4?
Comment 14 Swamp Workflow Management 2011-10-18 15:58:40 UTC
The SWAMPID for this issue is 43720.
This issue was rated as low.
Please submit fixed packages until 2011-11-15.
Also create a patchinfo file using this link:
https://swamp.suse.de/webswamp/wf/43720
Comment 15 Marcus Meissner 2011-10-18 15:59:21 UTC
ok for openSUSE, please submit there. needinfo SLE maint team
Comment 17 Marius Tomaschewski 2011-10-19 14:29:04 UTC
(In reply to comment #15)
> ok for openSUSE, please submit there. needinfo SLE maint team

OK, thanks!
Comment 18 Bernhard Wiedemann 2011-11-09 10:00:07 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/90746 11.4 / sysconfig
Comment 19 Bernhard Wiedemann 2011-11-09 15:00:08 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/90820 11.3 / sysconfig
Comment 20 Swamp Workflow Management 2011-11-15 17:24:54 UTC
Update released for: sysconfig, sysconfig-debuginfo, sysconfig-debugsource
Products:
openSUSE 11.3 (debug, i586, x86_64)
openSUSE 11.4 (debug, i586, x86_64)
Comment 21 Benjamin Brunner 2011-11-15 17:25:56 UTC
Updates for 11.3 and 11.4 released. I'll close the bug.
Comment 22 jun wang 2012-01-05 07:56:01 UTC
Hi, Ruediger Meier

I am testing the bug, but I knew a little about rcnetwork and ipv6.
So how did you discovered the bug? and what steps to reproduce it?

(In reply to comment #0)
> Hi,
> 
> rcnetwork start returns success when ipv6 adresses are still in tentative mode
> (still unusable).
> 
> This causes later init scripts to fail if they are starting servers which want
> to listen to specific configured IPs.
> (In my case named does not to start on boot.)
> 
> 
> Workaround:
> setting ip CONFFLAG nodad for such important IPs
> or disabling dad system wide (sysctl)
> or just add "sleep 5" at the end of rcnetwork start
> 
> Currently Ive added that sleep to my network init script.
> Better solution would be to wait for ip address state permanent (success) or
> dadfailed (error).
Comment 23 Marius Tomaschewski 2012-01-05 09:42:09 UTC
(In reply to comment #22)
> Hi, Ruediger Meier
> 
> I am testing the bug, but I knew a little about rcnetwork and ipv6.
> So how did you discovered the bug? and what steps to reproduce it?

Sorry, I forgot to attach a test description.

Here some hints how to reproduce it quite easy:

First, manual steps:

  ip link set dev eth0 down
  ip addr flush dev eth0
  --
  ip link set dev eth0 up
  ip addr add 2001:DB8:abba::1/64 dev eth0
  ip addr show

Every IPv6 is initially visible as "tentative", e.g.:

 inet6 2001:DB8:abba::1/64 scope global tentative

after few seconds, when the kernel finished DAD, this flag will
either disappear (success) or will get the "dadfailed" or the
"flags 08" addition (depends on ip route version).


Set up a NIC with static IPv6 address, e.g.:

STARTMODE=auto
BOOTPROTO='static'
IPADDR='2001:DB8:abba::1/64'

then execute the following commands:

  rcnetwork stop -o boot
  rcnetwork start -o boot ; ip addr show

  # sleep 10 ; ip addr show

When you see tentative (without flags 08 or dadfailed), you run
into this problem here.
The second (commented out) "ip a s" after 10 secs should usually
not show the tentative flag any more.

Now you can configure a service that makes use of the IP address
[bind() to this IP address, usually via "Listen 2001:DB8:abba::1"]
and this service may fail trying to use the tentative address when
it gets started just after network.

The fixed rcnetwork version may show it as:
    eth0        is up, but has tentative ipv6 address
but then wait until the DAD finished.
Comment 24 Marius Tomaschewski 2012-01-05 09:56:24 UTC
Further, "rcnetwork status" and "ifstatus eth0" report extended status
that could be:

  eth0   is up
  eth0   is up, but ipv6 duplicate address check failed
  eth0   is up, but has tentative ipv6 address
  eth0   is not up
  eth0   is dormant
  eth0   has no carrier

The fix also adds two variables that can be used to tune the behaviour
(for problematic NICs or to disable the check) by adding them to the
per interface config files -- ifcfg-eth0 (of ifcfg-bond0, ...):

## Type:        integer
## Default:     0
#
# The number of seconds to wait for link to become useable / ready.
# Default is 0, causing to not wait for a ready link (0), because link
# detection can't be enabled in all cases (e.g. bridges without ports).
# Please use per interface settings to enable it.
#
LINK_READY_WAIT=0

## Type:        integer
## Default:     ""
#
# The number of seconds to wait for the end of IPv6 duplicate address
# detection in ifup.
# Default is to use WAIT_FOR_INTERFACES/2 seconds in normal ifup runs.
# When ifup is called by /etc/init.d/network at boot time, the check
# is done, but /etc/init.d/network waits WAIT_FOR_INTERFACES seconds
# for all interfaces togerther. Set to 0 to disable it.
#
IPV6_DAD_WAIT=""
Comment 25 Ruediger Meier 2012-01-05 10:50:36 UTC
(In reply to comment #22)
> Hi, Ruediger Meier
> 
> I am testing the bug, but I knew a little about rcnetwork and ipv6.
> So how did you discovered the bug? and what steps to reproduce it?

Beside Marius' posts above how to "see" the tentative mode you may also want to see what could happen in practice and why we need to wait at all.
Try

$ ip addr add 2001:DB8:abba::1/64 dev eth0
$ ping6 -c1 -I 2001:DB8:abba::1  ::1
ping: bind icmp socket: Cannot assign requested address

after some seconds the ping should work.
Comment 26 Marius Tomaschewski 2012-01-05 16:15:24 UTC
Ahm... testing something else I noticed, that the code waits very short.
I've found a bug -- the current patch waits only 1/10 of the specified
time :-((

diff --git a/scripts/functions b/scripts/functions
index a12b91c..14d4104 100644
--- a/scripts/functions
+++ b/scripts/functions
@@ -196,7 +196,7 @@ link_ready_wait ()
        local iface=$1
        local -i wsecs=${2:-0}
        local -i uwait=25000
-       local -i loops=$(((wsecs * 100000) / $uwait))
+       local -i loops=$(((wsecs * 1000000) / $uwait))
        local -i loop=0 ret=0
 
        link_ready_check "$iface" ; ret=$?
@@ -212,7 +212,7 @@ ipv6_addr_dad_wait()
        local iface=$1
        local -i wsecs=${2:-0}
        local -i uwait=25000
-       local -i loops=$(((wsecs * 100000) / $uwait))
+       local -i loops=$(((wsecs * 1000000) / $uwait))
        local -i loop=0 ret=0
 
        ipv6_addr_dad_check "$iface" ; ret=$?
Comment 27 Swamp Workflow Management 2012-01-05 17:20:48 UTC
The SWAMPID for this issue is 44824.
This issue was rated as moderate.
Please submit fixed packages until 2012-01-19.
Also create a patchinfo file using this link:
https://swamp.suse.de/webswamp/wf/44824
Comment 28 Bernhard Wiedemann 2012-01-10 10:00:22 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/99577 12.1 / sysconfig
Comment 29 Bernhard Wiedemann 2012-01-10 11:00:08 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/99585 12.1 / sysconfig
Comment 30 Bernhard Wiedemann 2012-01-10 12:00:08 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/99593 11.4 / sysconfig
https://build.opensuse.org/request/show/99595 11.3 / sysconfig
Comment 31 Marius Tomaschewski 2012-01-10 12:54:04 UTC
request 99595 -> 11.3: bnc#697929,bnc#739338
request 99593 -> 11.4: bnc#697929,bnc#739338
request 99585 -> 12.1: bnc#739338,bnc#697929,bnc#733118,bnc#727771,bnc#734723
Comment 32 Bernhard Wiedemann 2012-02-15 20:00:11 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/105293 Factory / sysconfig
Comment 33 Bernhard Wiedemann 2012-02-17 22:00:22 UTC
This is an autogenerated message for OBS integration:
This bug (697929) was mentioned in
https://build.opensuse.org/request/show/105749 Evergreen:11.2 / sysconfig