Bugzilla – Bug 972471
Slow boot: 33s waiting for wicked interfaces to start
Last modified: 2018-02-17 04:46:49 UTC
A few days ago I ran a system update through 'zypper up' and I noticed that systemd and wicked were updated. Since then every time my system boots slowly because of this: Code: A start job is running for wicked managed network interfaces (33s / no limit)[ OK ] I haven't changed any network settings whatsoever. Before the update my system used to boot in about 15 seconds and I never had to wait for anything like that. Now the boot is x3 slower which is quite annoying as I am using a dual boot. Others confirmed the same problem in the forums: https://forums.opensuse.org/showthread.php/514501-Slow-boot-33s-waiting-for-wicked-interfaces-to-start
I have encountered this problem on a new install of Leap 42.1 on an Acer XC-705. In my case the wicked delay went away after I disabled IPv6 in YaST. There is still a noticeable delay that seems to be attributable to ModemManager and/or SuSE_firewall2 I started a thread on the opensuse mailing list that has attracted a lot of comments from people with similar observations (as well as a lot of other comment!). https://lists.opensuse.org/opensuse/2016-04/msg00009.html
Disabling IPv6 is not a solution, maybe just a workaround. I still want to be able to access Internet hosts through IPv6. Why should we disable a whole network protocol just to get a normal boot time?
Sorry, I didn't say, or mean to imply, that disabling IPv6 was a solution. Simply that it may be a factor in causing the delay. I'm just providing information to increase the number of people experiencing the bug visible to those with the ability to do something about it.
I believe that IPv6 related delay got fixed by the following commit: https://github.com/openSUSE/wicked/commit/079fa6210cbb7f5dee687aba21f50d0e19f8956b with version 0.6.30. What wicked version are you testing on? Could you provide some logs: https://en.opensuse.org/openSUSE:Bugreport_wicked
# rpm -q wicked wicked-0.6.30-6.1.x86_64 # rpm -q wicked-service wicked-service-0.6.30-6.1.x86_64
(In reply to Dave Howorth from comment #1) > I have encountered this problem on a new install of Leap 42.1 on an Acer > XC-705. In my case the wicked delay went away after I disabled IPv6 in YaST. > > There is still a noticeable delay that seems to be attributable to > ModemManager and/or SuSE_firewall2 SuSEfirewall2 as wicked is not using ModemManager yet. Does it also happen when you disable SuSEfirewall2(_init) services? Also, try to set: /etc/sysconfig/SuSEfirewall2, FW_BOOT_FULL_INIT=yes The SuSEfirewall2_init script blocks most icmpv6 traffic including IPv6 RA used by the kernel to set default route + autoip and start dhcpv6 [managed mode] processing in wicked. FW_BOOT_FULL_INIT=yes causes to apply final rules already in the init phase, so the packets are not blocked while the setup. Except of this, the kernel forgets to send a NEWLINK when it receives RA in some cases. When there is also a prefix in the RA, wicked will workaround it, but when not, it does not get informed. See also: http://patchwork.ozlabs.org/patch/512486/ http://patchwork.ozlabs.org/patch/512646/ which still aren't applied to the kernels...
// just clearing the NEEDINFO status as I already sent the info
Please disable SuSEfirewall2, enable debug, reboot and attach "wicked show-config" and "journalctl -b -o short-precise" outputs when it still happens with disabled SuSEfirewall2, see also: https://en.opensuse.org/openSUSE:Bugreport_wicked
(In reply to Marius Tomaschewski from comment #8) > Please disable SuSEfirewall2, enable debug, reboot and attach "wicked > show-config" and "journalctl -b -o short-precise" outputs when it still > happens with disabled SuSEfirewall2, see also: > > https://en.opensuse.org/openSUSE:Bugreport_wicked The link explains how to enable debug. How do I disable it after that and restore things back to the original (non-debugging) state?
Reset to empty variables as specified in "## Default: " tags in the /etc/sysconfig/network/config: WICKED_DEBUG='' WICKED_LOG_LEVEL='' BTW: WICKED_DEBUG='all' automatically sets WICKED_LOG_LEVEL=debug, it is explicit for the case, another level were set already before.
(In reply to Marius Tomaschewski from comment #8) > Please disable SuSEfirewall2, enable debug, reboot and attach "wicked > show-config" and "journalctl -b -o short-precise" outputs when it still > happens with disabled SuSEfirewall2, see also: > > https://en.opensuse.org/openSUSE:Bugreport_wicked FYI, in don't use the susefirewall, but I can confirm the 30s wait as described by George and Dave.
Created attachment 671427 [details] output from wicked show-config
Created attachment 671428 [details] journalctl -b -o short-precise
Here is my wicked debug info: https://goo.gl/Di6TXG After disabling debug (restoring to original as explained) I also tested with IPv6 disabled (with and without SuSE_firewall2). In all cases - absolutely no change. Still waiting for 33 seconds.
After upgrade to Leap the interface naming scheme changes: interfaces by default are named again as eth0, eth1, ... . I see you still have olf config files: ifcfg-eno1 etc. Therefore a wicked client waits for $WAIT_FOR_INTERFACES (30s) timeout for all interfaces that have configs specified (even though the interface with old names are not to show up). Please verify what you interfaces names are and leave only relevant configs (or change the STARTMODE to off or manual in the old ones).
(In reply to Pawel Wieczorkiewicz from comment #15) > After upgrade to Leap the interface naming scheme changes: interfaces by > default are named again as eth0, eth1, ... . > > I see you still have olf config files: ifcfg-eno1 etc. Therefore a wicked > client waits for $WAIT_FOR_INTERFACES (30s) timeout for all interfaces that > have configs specified (even though the interface with old names are not to > show up). > Please verify what you interfaces names are and leave only relevant configs > (or change the STARTMODE to off or manual in the old ones). When I open YaST->Network settings I see eth0 and eth1. What are you talking about? Can you please explain how to fix the thing you explain? (sorry, I am not an expert)
(In reply to Name Deleted from comment #16) > When I open YaST->Network settings I see eth0 and eth1. What are you talking > about? > > Can you please explain how to fix the thing you explain? (sorry, I am not an > expert) Sure. Please inspect manually a directory /etc/sysconfig/network. Look for ifcfg-NAME files that do not correspond to the interfaces present in your system. I see in the logs you provided that your system still has: /etc/sysconfig/network/ifcfg-enp2s0 /etc/sysconfig/network/ifcfg-eno1 (remove them) along the new ones: /etc/sysconfig/network/ifcfg-eth0 /etc/sysconfig/network/ifcfg-eth1 (which are ok and should stay)
(In reply to Pawel Wieczorkiewicz from comment #17) > Sure. Please inspect manually a directory /etc/sysconfig/network. Look for > ifcfg-NAME files that do not correspond to the interfaces present in your > system. > I see in the logs you provided that your system still has: > /etc/sysconfig/network/ifcfg-enp2s0 > /etc/sysconfig/network/ifcfg-eno1 > (remove them) > > along the new ones: > /etc/sysconfig/network/ifcfg-eth0 > /etc/sysconfig/network/ifcfg-eth1 > (which are ok and should stay) Thanks. Done and rebooted. No change at all - still waiting 33 sec for wicked...
(In reply to Name Deleted from comment #18) > > Thanks. Done and rebooted. > No change at all - still waiting 33 sec for wicked... I see that eth0 has no link up event: Apr 04 12:24:43.304117 i7 wickedd-nanny[1350]: eth0: state=device-up want=network-up, wait-for=link-up Is this interface connected to some network? If not (or there is no carrier on the cable of that network), please append: LINK_REQUIRED=no into /etc/sysconfig/network/ifcfg-eth0 and try over.
(In reply to Pawel Wieczorkiewicz from comment #19) > I see that eth0 has no link up event: > Apr 04 12:24:43.304117 i7 wickedd-nanny[1350]: eth0: state=device-up > want=network-up, wait-for=link-up > > Is this interface connected to some network? If not (or there is no carrier > on the cable of that network), please append: > LINK_REQUIRED=no > into /etc/sysconfig/network/ifcfg-eth0 and try over. eth0 is my second interface. It is used for a second internal LAN (which is not always available and online, I turn on the network switch only when I need to share the Internet to other computers). I have not changed its configuration in the 6 months at all. So it has always been the way it is and never caused this delay. Should I still do something after this explanation?
(In reply to Per Jessen from comment #13) > Created attachment 671428 [details] > journalctl -b -o short-precise Either there is still something blocking router icmpv6 responses or the kernel didn't forwarded it to userspace (patches mentioned in comment 6). It is not visible in the logs, but please disable the firewall using: systemctl disable SuSEfirewall2.service SuSEfirewall2_init.service Maybe this helps to see what's wrong: Write a /etc/wicked/scripts/pre-up/ip6tables-dump script with following content: #!/bin/bash set -e action="$1" ifname="$2" test "X$ifname" != "X" -a -d "/sys/class/net/$ifname" test -x /usr/sbin/ip6tables-save echo "*** $(date -u):" &>> "/tmp/ip6tables.${ifname}.${action}.dump" /usr/sbin/ip6tables-save &>> "/tmp/ip6tables.${ifname}.${action}.dump" echo "" &>> "/tmp/ip6tables.${ifname}.${action}.dump" ip addr show dev "$ifname" &>> "/tmp/ip6tables.${ifname}.${action}.dump" echo "" &>> "/tmp/ip6tables.${ifname}.${action}.dump" make it executable: chmod +x /etc/wicked/scripts/pre-up/ip6tables-dump And add either: PRE_UP_SCRIPT="wicked:pre-up" or PRE_UP_SCRIPT="wicked:pre-up/ip6tables-dump" to ifcfg-eth0 file. After reboot, attach /tmp/ip6tables.eth0.pre-up. Further, installing "radvd" and running "radvdump" for a while (10min) and/or calling "ip link set down eth0 ; ip link set up eth0" would show what is in the RA. Workaround: You can provide the IPv6 RA managed/other-config flags explicitly in the ifcfg-eth0 by adding DHCLIENT6_MODE='managed' to it. (In reply to Name Deleted from comment #14) > Here is my wicked debug info: > > https://goo.gl/Di6TXG > > After disabling debug (restoring to original as explained) I also tested > with IPv6 disabled (with and without SuSE_firewall2). In all cases - > absolutely no change. Still waiting for 33 seconds. You have configuration for ifcfg-{eno1,eth0,enp2s0,eth1} ethernet interfaces and even when the eth1 setup basically finished after 4 seconds: Apr 04 12:24:39.018645 i7 wicked[1351]: Executing: /usr/sbin/wicked --systemd ifup all ... Apr 04 12:24:43.303684 i7 wickedd-nanny[1350]: eth1: successfully transitioned from addrconf-up to network-up Apr 04 12:24:43.303904 i7 wicked[1351]: received signal progressInfo; object_path=/org/opensuse/Network/Nanny/Interface/3; target_state=network-up, state_name=network-up it still waits for the other requested interfaces: Apr 04 12:24:43.304117 i7 wickedd-nanny[1350]: eth0: state=device-up want=network-up, wait-for=link-up Apr 04 12:24:43.304330 i7 wickedd-nanny[1350]: waiting for 1 devices to become ready (1 explicitly requested) Apr 04 12:24:43.304571 i7 wickedd-nanny[1350]: ni_nanny_recheck(eno1) Apr 04 12:24:43.304771 i7 wickedd-nanny[1350]: eno1: no applicable policies Apr 04 12:24:43.304983 i7 wickedd-nanny[1350]: ni_nanny_recheck(enp2s0) Apr 04 12:24:43.305200 i7 wickedd-nanny[1350]: enp2s0: no applicable policies until the timeout is reached: Apr 04 12:25:09.087723 i7 wicked[1351]: Interface wait time reached This has not much in common with ipv6. Not ignoring missed STARTMODE=hotplug configs is bug #945601. But in your case, it are even STARTMODE=boot devices, so it works as expected here. An rm -v /etc/sysconfig/network/ifcfg-{eno1,eth0,enp2s0} will stop waiting for eno1 enp2s0 (which are eth0 + eth1 probably) + eth0. When eth0 is used, but currently not connected, you mv ifcfg-eth0 .ifcfg-eth0 as workaround to avoid the timeout until STARTMODE=hotplug & co is fixed. It would be possible to set LINK_REQUIRED=no for eth0, but this is generally IMO not a good idea (except in special cases) and configures IPs + routes on unconnected interfaces (-> black hole) and ignoring network layer order. I'd tend to resolve this report as dup of #945601 (even the config is not using STARTMODE=hotplug). Does it make sense to split the report into multiple? We have already two different scenarios here.
(In reply to Marius Tomaschewski from comment #21) As noted in a previous reply I have already removed eno1 and enp2s0 but that didn't change anything. As for eth1 causing the timeout: The question is why has it never caused such waiting so far and suddenly after the system update 2-3 weeks ago it is causing a problem? I don't know what is the proper configuration. I just noticed the long wait and reported. The actual network structure is: eth1 - connected to the main router (so called external) eth0 - second internal LAN which as I explained is not always online, i.e. the net switch is turned on only when the second LAN is needed Please let me know if I really need to change anything in config. As for splitting the bug - I don't know. As long as it gets fixed, it is up to you I guess.
(In reply to Marius Tomaschewski from comment #21) > (In reply to Per Jessen from comment #13) > > Created attachment 671428 [details] > > journalctl -b -o short-precise > > Either there is still something blocking router icmpv6 responses or the > kernel didn't forwarded it to userspace (patches mentioned in comment 6). > > It is not visible in the logs, but please disable the firewall using: > systemctl disable SuSEfirewall2.service SuSEfirewall2_init.service Hi Marius sorry, I forgot to mention it: # systemctl status SuSEfirewall2 SuSEfirewall2.service - SuSEfirewall2 phase 2 Loaded: loaded (/usr/lib/systemd/system/SuSEfirewall2.service; disabled) Active: inactive (dead) > Maybe this helps to see what's wrong: Write a > /etc/wicked/scripts/pre-up/ip6tables-dump > script with following content: > [snip] > > And add either: > PRE_UP_SCRIPT="wicked:pre-up" > or > PRE_UP_SCRIPT="wicked:pre-up/ip6tables-dump" > > to ifcfg-eth0 file. After reboot, attach /tmp/ip6tables.eth0.pre-up. Okay, willdo. I'm pretty certain there will be nothing there. > Further, installing "radvd" and running "radvdump" for a while (10min) > and/or calling "ip link set down eth0 ; ip link set up eth0" would > show what is in the RA. I'm running radvdump now. I have access to radvd on the router too. > Workaround: You can provide the IPv6 RA managed/other-config flags > explicitly in the ifcfg-eth0 by adding DHCLIENT6_MODE='managed' to it. Oh, the boot-time isn't such a big deal for me, it's just weird.
Created attachment 671462 [details] output from the pre-up script
Created attachment 671464 [details] radvdump
> I believe that IPv6 related delay got fixed by the following commit: > https://github.com/openSUSE/wicked/commit > /079fa6210cbb7f5dee687aba21f50d0e19f8956b > with version 0.6.30. Dunno > What wicked version are you testing on? Could you provide some logs: > https://en.opensuse.org/openSUSE:Bugreport_wicked It would be useful if that page was explicit about what 'logs' it wanted. This bugzilla page could be clearer about how to create attachments, but hopefully they will follow this note.
Created attachment 671720 [details] config-ifcfg.tgz
Created attachment 671721 [details] config-dump.log
Created attachment 671722 [details] status.log
Created attachment 671723 [details] wicked.log
Created attachment 671724 [details] ip_addr.log
Created attachment 671725 [details] routes.log
Created attachment 671726 [details] hwinfo-netcard.log
(In reply to Marius Tomaschewski from comment #6) > (In reply to Dave Howorth from comment #1) > > I have encountered this problem on a new install of Leap 42.1 on an Acer > > XC-705. In my case the wicked delay went away after I disabled IPv6 in YaST. > > > > There is still a noticeable delay that seems to be attributable to > > ModemManager and/or SuSE_firewall2 > > SuSEfirewall2 as wicked is not using ModemManager yet. Not quite sure what you meant there. > Does it also happen when you disable SuSEfirewall2(_init) services? No. I shall re-enable the firewall and stop there until I hear what else is needed. > Also, try to set: > /etc/sysconfig/SuSEfirewall2, FW_BOOT_FULL_INIT=yes What follows is TL;DR or more to the point I didn't try to understand. I don't know whether the firewall delay has any connection to the wicked delay. > The SuSEfirewall2_init script blocks most icmpv6 traffic including IPv6 RA > used by the kernel to set default route + autoip and start dhcpv6 [managed > mode] processing in wicked. > FW_BOOT_FULL_INIT=yes causes to apply final rules already in the init phase, > so the packets are not blocked while the setup. > > Except of this, the kernel forgets to send a NEWLINK when it receives RA in > some cases. When there is also a prefix in the RA, wicked will workaround it, > but when not, it does not get informed. See also: > http://patchwork.ozlabs.org/patch/512486/ > http://patchwork.ozlabs.org/patch/512646/ > which still aren't applied to the kernels...
(In reply to Marius Tomaschewski from comment #21) I tried to set STARTUPMODE=hotplug for eth0 but that didn't change anything. Still 33 sec waiting on boot.
(In reply to Per Jessen from comment #23) > # systemctl status SuSEfirewall2 It is SuSEfirewall2_init.service which blocks everything. (In reply to Dave Howorth from comment #34) > (In reply to Marius Tomaschewski from comment #6) > > (In reply to Dave Howorth from comment #1) > > > I have encountered this problem on a new install of Leap 42.1 on an Acer > > > XC-705. In my case the wicked delay went away after I disabled IPv6 in YaST. > > > > > > There is still a noticeable delay that seems to be attributable to > > > ModemManager and/or SuSE_firewall2 > > > > SuSEfirewall2 as wicked is not using ModemManager yet. > > Not quite sure what you meant there. ?? Wicked is not using ModemManager yet, so this cannot be the issue IMO. SuSEfirewall2_init.service is a known reason as it drops mandatory IPv6 packets. > > Does it also happen when you disable SuSEfirewall2(_init) services? > > No. I shall re-enable the firewall and stop there until I hear what > else is needed. Try to reproduce with disabled and provide logs showing them, e.g.: systemctl disable SuSEfirewall2_init SuSEfirewall2 ; reboot When you cannot reproduce, enable them again and try to reproduce. When you can, set FW_BOOT_FULL_INIT=yes SuSEfirewall2 variable to apply all rules in SuSEfirewall2_init.service already, instead to block IPv6 packets, until SuSEfirewall2.service applied the full rule set permitting them again. In your logs, there is e.g.: Apr 05 22:07:25.685889 acer-suse kernel: SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=d8:cb:8a:9c:3a:a5:58:98:35:99:90:3e:08:00 SRC=192.168.1.254 DST=192.168.1.71 LEN=307 TOS=0x00 PREC=0x00 TTL=64 ID=15254 PROTO=UDP SPT=67 DPT=68 LEN=287 that is SuSEfirewall blocked a DHCPv4 packet in this case. At boot time, there was: * SuSEfirewall2_init starts and blocks: Apr 03 22:12:29.500129 acer-suse SuSEfirewall2[1210]: Firewall rules set to CLOSE. ... * wicked is trying to setup: Apr 03 22:12:32.493129 acer-suse wickedd-dhcp4[1216]: eth0: Request to acquire DHCPv4 lease with UUID 3d870157-e0a6-0b00-c204-000004000000 Apr 03 22:12:39.421154 acer-suse wickedd-dhcp4[1216]: eth0: Committed DHCPv4 lease with address 192.168.1.71 (lease time 86400 sec, renew in 43200 sec, rebind in 75600 sec) Apr 03 22:12:39.457411 acer-suse wickedd[1218]: eth0: Notified neighbours about IP address 192.168.1.71 * wicked finished and reports: Apr 03 22:12:39.540869 acer-suse wicked[1220]: lo up Apr 03 22:12:39.541033 acer-suse wicked[1220]: eth0 up * SuSEfirewall2 starts and permits IPv6 now: Apr 03 22:12:39.559568 acer-suse SuSEfirewall2[1508]: Setting up rules from /etc/sysconfig/SuSEfirewall2 ... Apr 03 22:12:39.717861 acer-suse SuSEfirewall2[1604]: Firewall rules successfully set So when you enable SuSEfirewall2(_init) with FW_BOOT_FULL_INIT=no, it will block IPv6 and wicked is waiting the full timeout. After, SuSEfirewall2 service starts and permits them and nanny, which continues with the ifup in background, will configure after a while (on next router announcement). > > Also, try to set: > > /etc/sysconfig/SuSEfirewall2, FW_BOOT_FULL_INIT=yes > > What follows is TL;DR or more to the point I didn't try to understand. > I don't know whether the firewall delay has any connection to the wicked > delay. But I know that is does -- that's why there is also a bug open for this. The debug logs you provided do not show the issue / any timeout, because IPv6 was disabled and you didn't rebooted / triggered the issue. Whe manual ifup visible in the logs after debug has been enabled needed 1 second to renew the ipv4 only configuration. One of the dhcpv4 packets have been dropped while this attempt, but as dhcp4 cannot use using normal TCP/IP, but captures lower-layer frames directly (same to tcpdump), it were able to see even a dropped packet. In IPv6 case, using packet capture is not needed and neither kernel nor wicked is doing this, but are using normal IPv6 multicasts from fe80 address as source, which exists on any ipv6 interface. (In reply to Dave Howorth from comment #26) > > What wicked version are you testing on? Could you provide some logs: > > https://en.opensuse.org/openSUSE:Bugreport_wicked > > It would be useful if that page was explicit about what 'logs' it wanted. > This bugzilla page could be clearer about how to create attachments, but > hopefully they will follow this note. This depends on the issue, but usually all kernel (debug), systemd (debug), wicked debug, wpa_supplicant (debug) and sometimes also other logs, so basically the complete "journalctl -b -o short-precise" logs --- where the issue is visible / where the issue happened. (In reply to Name Deleted from comment #22) > (In reply to Marius Tomaschewski from comment #21) > > As noted in a previous reply I have already removed eno1 and enp2s0 but that > didn't change anything. > > As for eth1 causing the timeout: The question is why has it never caused > such waiting so far and suddenly after the system update 2-3 weeks ago it is > causing a problem? The delay with configuration for non-existing devices or without carrier was always there -- wicked simply waits until they interface appear / get carrier to setup them. Except of this, I don't know what all you've reconfigured installed, updated or changed in this time. Perhaps you just did not noticed it?
(In reply to Marius Tomaschewski from comment #36) > (In reply to Per Jessen from comment #23) > > # systemctl status SuSEfirewall2 > > It is SuSEfirewall2_init.service which blocks everything. Although both SuSEfirewall2_init and SuSEfirewall2 are disabled? I've just mask'ed both of them and rebooted - # systemd-analyze blame 22.459s wicked.service
How the setup flow works: - wicked (creates+)configures the device and sets the link UP - kernel reports UP and makes carrier detection, e.g.: kernel: e1000e: nic0 NIC Link is Down kernel: IPv6: ADDRCONF(NETDEV_UP): nic0: link is not ready kernel: e1000e: nic0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None - kernel reports that carrier is detected, ... (LOWER_UP, state UP) kernel: IPv6: ADDRCONF(NETDEV_CHANGE): nic0: link becomes ready -> ipv4 starts to be usable, except the network driver is laying about carrier - kernel sets the fe80 ipv6 address on the device (derived from MAC address) which is marked tentative, that is not usable yet - kernel preforms duplicate (mac) address detection on the fe80 address - kernel removes the tentative flag and reports NEWADDR -> ipv6 starts to be usable, except the network driver is laying about carrier - wicked is monitoring this and with DHCLIENT6_MODE != auto starts DHCPv6 here. - kernel sends router solicitations (RS) packages and waits until router advertisement (RA) arrives - kernel receives RA, e.g: interface eth0 { AdvManagedFlag on; ## request address + DNS/NTP via DHCPv6 AdvOtherConfigFlag on; ## request DNS/NTP/... config only AdvDefaultLifetime 1800; ## >0, set default route prefix 2001:db8::/64 { ## /64 is usable for autoconf AdvAutonomous on; ## assign autoconf addresses from prefix } } and forwards the main part in NEWLINK update and prefix in NEWPREFIX events to the user space programs - kernel sets ipv6 default route via this router for 1800 seconds - wicked receives the events and starts sending DHCPv6 packets - kernel sets tenative ${prefix}:${eui64 from mac}/64 and if enabled, also ${prefix}:${random}/64 addresses - kernel preforms duplicate address detection on the autoconf IPs - wicked receives DHCPv6 response e.g. here and sets them on the interface - kernel sets the tentative DHCPv6 address on the interface - kernel removes the tentative flag from AUTOv6 IPs and emits NEWADDR events - kernel removes the tentative flag from DHCPv6 IPs and emits NEWADDR events - wicked executes netconfig to write e.g. /etc/resolv.conf - wicked ifup makes a final check if there are still some tentative addresses somewhere, reports "up" (network-up) + returns 0.
Created attachment 671846 [details] ifbind.sh script to emulate hotplugging (In reply to Per Jessen from comment #37) > Although both SuSEfirewall2_init and SuSEfirewall2 are disabled? > I've just mask'ed both of them and rebooted - > > # systemd-analyze blame > 22.459s wicked.service Your RA contains: AdvReachableTime 0; AdvRetransTimer 0; that is the kernel forgets to send the RA info (AdvManagedFlag on; AdvOtherConfigFlag on;) to the userspace via NEWLINK. But there should be still a NEWPREFIX event which wicked monitors too. -> comment 6 + http://bugzilla.opensuse.org/show_bug.cgi?id=933590#c13 Then either your nic driver is laying about carrier (RS packets do not arrive on the router), e.g. because it makes an 15s STP forward-delay which it forgets to mention or the RS/RA packets do not arrive for some other reason. Difficult to say what happens on these layers. Does the kernel configure autoconf addresses earlier? You can try to monitor things: ip monitor link address prefix neigh route and then use ifbind.sh unbind eth0 ; sleep 3 ; ifbind bind eth0 or rmmod driver ; modprobe driver to remove and add the interface again. When you set WICKED_LOG_LEVEL=debug2 (+rcwickedd restart), it will trace events, e.g. "obtain config and address via DHCPv6" that it received from the kernel. But when it happens at boot only, this may not work/show the causes; cold devices on boot quite often behave differently than devices, which were used before (sending to /dev/null after carrier detection).
(In reply to Per Jessen from comment #37) > # systemd-analyze blame > 22.459s wicked.service But this means, the RA arrived and wicked ifup did not wait the full ~30sec timeout. Can you provide logs from such boot so we see how the times are / what needs time?
(In reply to Marius Tomaschewski from comment #40) > (In reply to Per Jessen from comment #37) > > # systemd-analyze blame > > 22.459s wicked.service > > But this means, the RA arrived and wicked ifup did not wait the full > ~30sec timeout. Can you provide logs from such boot so we see how > the times are / what needs time? Apart from not having mask'ed the two firewall services, I think the logs I provided earlier will show the situation. I'll be happy to provide new logs too.
Created attachment 671907 [details] log covering "rmmod tg3 && modprobe tg3" I added debug2 as described in comment#39 and unloaded/loaded tg3, the network driver module. The process takes about 21seconds, from 18:34:14 to 18:34:35. The log looks interesting.
(In reply to Marius Tomaschewski from comment #36) > The delay with configuration for non-existing devices or without carrier > was always there -- wicked simply waits until they interface appear / get > carrier to setup them. > Except of this, I don't know what all you've reconfigured installed, > updated or changed in this time. Perhaps you just did not noticed it? I am not sure I understand what you mean. There has never been such delay before using the same configuration.
(In reply to Per Jessen from comment #41) > I think the logs I provided earlier will show the situation. Well, it shows that there is a timeout because events seem to not arrive, but not why/no details. The new logs do not show anything more, because there was no "rcwickedd restart" after "debug2" change. Please verify, the log setup is at debug2: grep ^WICKED /etc/sysconfig/network/config should be: WICKED_DEBUG='all' WICKED_LOG_LEVEL='debug2' Then please execute the following commands / paste to a root shell: rcwickedd restart ## so the debug2 is used tcpdump -envfi any icmp6 2>&1 | systemd-cat -t "tcpdump" & ip monitor link address prefix neigh route | systemd-cat -t "ip-monitor" & rmmod tg3 ### remove the nic driver cursor=`journalctl -n 10 -o export | grep ^__CURSOR= | sed -e 's|^__CURSOR=||' | head -1` modprobe tg3 ### load the nic driver sleep 40 ip a s | systemd-cat -t "ip-addrs" ip -6 r s | systemd-cat -t "ip-route6" sysctl -a | grep ^net.ipv6.conf | systemd-cat -t "ipv6-sysctls" kill $(jobs -lp) journalctl -c "$cursor" -o short-precise > journal.txt and attach journal.txt please.
Created attachment 672377 [details] journal.txt as requested office34:/tmp # grep ^WICKED /etc/sysconfig/network/config WICKED_DEBUG="all" WICKED_LOG_LEVEL="debug2"
(In reply to Per Jessen from comment #45) > Created attachment 672377 [details] > journal.txt as requested > > office34:/tmp # grep ^WICKED /etc/sysconfig/network/config > WICKED_DEBUG="all" > WICKED_LOG_LEVEL="debug2" OK, the workaround for bug 975020 (I've cloned the original sles12 bug) makes an ipv6 link refresh only (needed due to missed event), when the client requested auto mode already; in your case, the config is applied later: Apr 11 14:16:55.687794 office34 wickedd-dhcp6[858]: eth0: Found usable link-local IPv6 address: fe80::218:feff:fe6a:730b Apr 11 14:16:55.690365 office34 wickedd-dhcp6[858]: eth0: update IPv6 RA<unmanaged> Prefix<2a03:7520:4c68:1::/64 onlink,autoconf>[14400, 86400] .... Apr 11 14:16:56.441782 office34 wickedd-dhcp6[858]: ni_objectmodel_dhcp6_acquire_svc(dev=eth0) In my case, the client is faster: Apr 11 13:11:51.977785 xanthos wickedd-dhcp6[11017]: ni_objectmodel_dhcp6_acquire_svc(dev=nic0) Apr 11 13:11:51.978008 xanthos wickedd-dhcp6[11017]: nic0: Request to acquire DHCPv6 lease with UUID 4f860b57-1fb9-0300-0f2b-000008000000 in mode auto Apr 11 13:11:57.720839 xanthos wickedd-dhcp6[11017]: nic0: update IPv6 RA<unmanaged> Prefix<2001:470:5036:2000::/64 onlink,no-autoconf>[2400, 3600] Apr 11 13:11:57.720859 xanthos wickedd-dhcp6[11017]: IPv6 link info refresh of nic0 interface Apr 11 13:11:57.720968 xanthos wickedd-dhcp6[11017]: nic0: obtain config and address via DHCPv6 This is probably the attempt to set a static the default route. I've removed the config check and refresh unconditionally in: https://github.com/openSUSE/wicked/pull/637 A test RPM with the unconditional workaround will be available in: http://download.opensuse.org/repositories/network:/wicked:/testing/openSUSE_Leap_42.1/
(In reply to Name Deleted from comment #43) > I am not sure I understand what you mean. There has never been such delay > before using the same configuration. Really, the bug, that a config for a not-existing device (_even_ marked as hotplug, that is hotplug mode is not honored properly) is older. Waiting for carrier on existing devices is normal -- e.g. it makes no sense to send DHCP*, RS (IPv6 RA request), preform duplicate address detection, try to detect topology loops (bridge STP), authenticate, ... without carrier. When possible, we'd like that "ifup ; ping" just work, so we _use_ the time we are allowed to wait (WAIT_FOR_INTERACES default of 30sec). We also cannot lower the timeout or we would not wait long enough for some drivers.
(In reply to Marius Tomaschewski from comment #47) > (In reply to Name Deleted from comment #43) > > I am not sure I understand what you mean. There has never been such delay > > before using the same configuration. > > Really, the bug, that a config for a not-existing device (_even_ marked as > hotplug, that is hotplug mode is not honored properly) is older. > Waiting for carrier on existing devices is normal -- e.g. it makes no sense > to send DHCP*, RS (IPv6 RA request), preform duplicate address detection, > try to detect topology loops (bridge STP), authenticate, ... without carrier. > When possible, we'd like that "ifup ; ping" just work, so we _use_ the time > we are allowed to wait (WAIT_FOR_INTERACES default of 30sec). We also cannot > lower the timeout or we would not wait long enough for some drivers. The question is why so far, with the exact same config, there has never been such a problem and suddenly after an update - there is one? Also - if I lower manually WAIT_FOR_INTERACES to (say) 3 seconds - what could be the negative impact?
(In reply to Name Deleted from comment #48) > The question is why so far, with the exact same config, there has never been > such a problem and suddenly after an update - there is one? Perhaps you just didn't noticed it before? Or there were no obsolete ifcfg-files before? The cable of the additional link was connected? > Also - if I lower manually WAIT_FOR_INTERACES to (say) 3 seconds - what > could be the negative impact? ifup will return after 3 sec, setup will happen in background and when you have services which bind to specific IPs, the'll fail.
(In reply to Marius Tomaschewski from comment #49) > (In reply to Name Deleted from comment #48) > > The question is why so far, with the exact same config, there has never been > > such a problem and suddenly after an update - there is one? > > Perhaps you just didn't noticed it before? Or there were no obsolete > ifcfg-files before? The cable of the additional link was connected? > You can be 100% sure that I would notice it. I have specifically tuned my boot up time to be very fast because I use dual boot. So when it suddenly showed 33 extra seconds I surely saw that too. It is in no way related to cables or anything else. As I mentioned - same settings, same hardware, same network, same cables. The only different thing is this extra 33 seconds. > > Also - if I lower manually WAIT_FOR_INTERACES to (say) 3 seconds - what > > could be the negative impact? > > ifup will return after 3 sec, setup will happen in background and > when you have services which bind to specific IPs, the'll fail. Thanks for explaining. That is surely not desired.
(In reply to Name Deleted from comment #50) > It is in no way related to cables or anything else. As I mentioned - same > settings, same hardware, same network, same cables. The only different thing > is this extra 33 seconds. OK, then please install the old *wicked* version which still worked and provide debug as described in comment 44 from both versions. A 'hwinfo --netcard | grep "Driver Modules"' will show you the module names to use in rmmod/modprobe lines instead of tg3.
Per, does the change / RPMs from comment 45 remove the dhcpv6 delays for you?
(In reply to Marius Tomaschewski from comment #51) > > OK, then please install the old *wicked* version which still worked > and provide debug as described in comment 44 from both versions. > Which one is the old? In YaST I search or wicked and see: 0.6.30-6.1-x86_64 from vendor openSUSE (Installed) 0.6.28-3.1-x86_64 from openSUSE-leap/42.1-Update with priority 99 and vendor openSUSE 0.6.27-1.1-x86_64 from Main Repository (OSS) with priority 99 and vendor openSUSE Do I have to downgrade wicked-service and libwicked-0-6 too? > A 'hwinfo --netcard | grep "Driver Modules"' will show you the module > names to use in rmmod/modprobe lines instead of tg3. This gives: Driver Modules: "e1000e" Driver Modules: "r8169" Also do I have to re-debug again for current version as I have already sent the debug for it in an earlier comment?
(In reply to Marius Tomaschewski from comment #52) > Per, > does the change / RPMs from comment 45 remove the dhcpv6 delays for you? Yes, I installed yesterday, and observed a delay of about 12secs, much better. Is that about what you expected?
(In reply to Per Jessen from comment #54) > (In reply to Marius Tomaschewski from comment #52) > > Per, > > does the change / RPMs from comment 45 remove the dhcpv6 delays for you? > > Yes, I installed yesterday, and observed a delay of about 12secs, much > better. Is that about what you expected? Yes, I'd say 12sec are OK on a cold/unused link - depends on the NIC [how long it needs to initialize, to detect carrier, forwards packets after it reported link UP] and if your router is using a "3sec silence between answers" or not (some do). (In reply to Name Deleted from comment #53) > (In reply to Marius Tomaschewski from comment #51) > Which one is the old? In YaST I search or wicked and see: > > 0.6.30-6.1-x86_64 from vendor openSUSE (Installed) > 0.6.28-3.1-x86_64 from openSUSE-leap/42.1-Update with priority 99 and vendor > openSUSE > 0.6.27-1.1-x86_64 from Main Repository (OSS) with priority 99 and vendor > openSUSE I've no idea which version were working for you before. See /var/log/zypp/history, it should show what were installed when. > Do I have to downgrade wicked-service and libwicked-0-6 too? Yes please. It should be not even possible to omit them. > Also do I have to re-debug again for current version as I have already sent > the debug for it in an earlier comment? I'd say not the current (old 0.6.30 in the meantime), but master build: http://download.opensuse.org/repositories/network:/wicked:/master/openSUSE_Leap_42.1/ which contains the missed kernel event workaround fix from comment 46.
(In reply to Marius Tomaschewski from comment #55) > (In reply to Per Jessen from comment #54) > > (In reply to Marius Tomaschewski from comment #52) > > > Per, > > > does the change / RPMs from comment 45 remove the dhcpv6 delays for you? > > > > Yes, I installed yesterday, and observed a delay of about 12secs, much > > better. Is that about what you expected? > > Yes, I'd say 12sec are OK on a cold/unused link - depends on the NIC > [how long it needs to initialize, to detect carrier, forwards packets > after it reported link UP] and if your router is using a "3sec silence > between answers" or not (some do). > > (In reply to Name Deleted from comment #53) > > (In reply to Marius Tomaschewski from comment #51) > > Which one is the old? In YaST I search or wicked and see: > > > > 0.6.30-6.1-x86_64 from vendor openSUSE (Installed) > > 0.6.28-3.1-x86_64 from openSUSE-leap/42.1-Update with priority 99 and vendor > > openSUSE > > 0.6.27-1.1-x86_64 from Main Repository (OSS) with priority 99 and vendor > > openSUSE > > I've no idea which version were working for you before. > See /var/log/zypp/history, it should show what were installed when. > > > Do I have to downgrade wicked-service and libwicked-0-6 too? > > Yes please. It should be not even possible to omit them. > > > Also do I have to re-debug again for current version as I have already sent > > the debug for it in an earlier comment? > > I'd say not the current (old 0.6.30 in the meantime), but master build: > > http://download.opensuse.org/repositories/network:/wicked:/master/ > openSUSE_Leap_42.1/ > > which contains the missed kernel event workaround fix from comment 46. Ok, I have tried all 3 versions: 0.6.30 0.6.31 (from master) 0.6.28 (which I found in the zypper history log) - this one I even tried with reverting to the previous version of systemd With all these the delay is still 33 seconds. Bummer. I also tried to set eth0 to "On cable connection" and to "Never". Only when set to boot "Never" the delay is gone. I followed the procedure from comment 44. I didn't reboot between creating the logs. The exact thing I did was: 1. Go to YaST and install the package versions 2. Execute commands: rcwickedd restart tcpdump -envfi any icmp6 2>&1 | systemd-cat -t "tcpdump" & ip monitor link address prefix neigh route | systemd-cat -t "ip-monitor" & rmmod r8169 rmmod e1000e cursor=`journalctl -n 10 -o export | grep ^__CURSOR= | sed -e 's|^__CURSOR=||' | head -1` modprobe r8169 modprobe e1000e sleep 40 ip a s | systemd-cat -t "ip-addrs" ip -6 r s | systemd-cat -t "ip-route6" sysctl -a | grep ^net.ipv6.conf | systemd-cat -t "ipv6-sysctls" kill $(jobs -lp) journalctl -c "$cursor" -o short-precise > journal-XXX.txt where XXX is the version of wicked as you will see in the 3 log files: https://goo.gl/mnSuJx Another thing I would like to ask for further clarification: (In reply to Marius Tomaschewski from comment #49) > (In reply to Name Deleted from comment #48) > > Also - if I lower manually WAIT_FOR_INTERACES to (say) 3 seconds - what > > could be the negative impact? > > ifup will return after 3 sec, setup will happen in background and > when you have services which bind to specific IPs, the'll fail. How do I check which these services are and if they bind to specific IPs? The internal zone interface (eth0) is used for sharing the internet connection to that internal network + file transfers through sftp and smb. All I need is to make sure that when that network is online (the net switch is turned on) it will be possible to use these network functions + of course get rid of the 33 sec delay. I hope you can help. Thanks.
This is an autogenerated message for OBS integration: This bug (972471) was mentioned in https://build.opensuse.org/request/show/390531 42.1 / wicked
This is an autogenerated message for OBS integration: This bug (972471) was mentioned in https://build.opensuse.org/request/show/390219 Factory / wicked
*** Bug 980460 has been marked as a duplicate of this bug. ***
I'm on the openSUSE Tumbleweed has long been a similar problem. I'm thinking to configure the network through systemd.
This is an autogenerated message for OBS integration: This bug (972471) was mentioned in https://build.opensuse.org/request/show/440824 Factory / wicked
This is an autogenerated message for OBS integration: This bug (972471) was mentioned in https://build.opensuse.org/request/show/440828 13.2 / wicked https://build.opensuse.org/request/show/440829 42.1 / wicked https://build.opensuse.org/request/show/440830 42.2 / wicked
You can always set DHCLIENT6_MODE='managed' in the ifcfg file to override the IPv6 RA sent by the router and the behavior of the dhcp6 client. I've improved/corrected the workaround for the missed kernel events; see also bug 916956 and bug 975020 + upstream accepted kernel fixes at: http://patchwork.ozlabs.org/patch/512486/ http://patchwork.ozlabs.org/patch/512646/
*** Bug 993516 has been marked as a duplicate of this bug. ***
openSUSE-RU-2016:2978-1: An update that has 19 recommended fixes can now be installed. Category: recommended (moderate) Bug References: 865573,902897,913861,916613,916948,972471,973355,974231,975466,981887,982231,982952,984088,985894,988794,988954,989741,997027,998413 CVE References: Sources used: openSUSE Leap 42.1 (src): wicked-0.6.39-12.1 openSUSE 13.2 (src): wicked-0.6.39-36.1
Thank you!
SUSE-RU-2016:3055-1: An update that has 6 recommended fixes can now be installed. Category: recommended (moderate) Bug References: 972471,975466,988794,988954,997027,998413 CVE References: Sources used: SUSE Linux Enterprise Server for Raspberry Pi 12-SP2 (src): wicked-0.6.39-34.1 SUSE Linux Enterprise Server 12-SP2 (src): wicked-0.6.39-34.1 SUSE Linux Enterprise Desktop 12-SP2 (src): wicked-0.6.39-34.1
openSUSE-RU-2016:3139-1: An update that has 6 recommended fixes can now be installed. Category: recommended (moderate) Bug References: 972471,975466,988794,988954,997027,998413 CVE References: Sources used: openSUSE Leap 42.2 (src): wicked-0.6.39-4.1
SUSE-RU-2016:3192-1: An update that has 19 recommended fixes can now be installed. Category: recommended (moderate) Bug References: 865573,902897,913861,916613,916948,972471,973355,974231,975466,981887,982231,982952,984088,985894,988794,988954,989741,997027,998413 CVE References: Sources used: SUSE Linux Enterprise Server 12-SP1 (src): wicked-0.6.39-28.3.1 SUSE Linux Enterprise Desktop 12-SP1 (src): wicked-0.6.39-28.3.1
*** Bug 1013918 has been marked as a duplicate of this bug. ***