Bug 1224860

Summary: recent wicked change related to STARTMODE?
Product: [openSUSE] openSUSE Tumbleweed Reporter: Patrick Schaaf <patrick.schaaf>
Component: NetworkAssignee: wicked maintainers <wicked-maintainers>
Status: RESOLVED INVALID QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: mt, patrick.schaaf
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Patrick Schaaf 2024-05-22 08:19:37 UTC
I'm running a number (few hundreds) of servers for a few (6 or so) years on tumbleweed, with the network setup using
* wicked
* generated ifcfg files (own generator)
* vlan interfaces on top of a vlan enabled bridge on top of bond (lacp) on top of eth

With the recent wicked update (0.6.74 -> 0.6.75) tested today, network did not come up as it used to. No errors were seen in logs or systemd services, just the interfaces all were not configured.

The change in behaviour apparently relates to recursive determination of what to bring up. I'm used to (and thought at some point in the past had to) have STARTMODE configured like this:

penn:~ # egrep -H ^STARTMODE /etc/sysconfig/network/ifcfg-*
/etc/sysconfig/network/ifcfg-eth0:STARTMODE=manual
/etc/sysconfig/network/ifcfg-eth1:STARTMODE=manual
/etc/sysconfig/network/ifcfg-lacp:STARTMODE=manual
/etc/sysconfig/network/ifcfg-lo:STARTMODE=nfsroot
/etc/sysconfig/network/ifcfg-v0001:STARTMODE=auto
/etc/sysconfig/network/ifcfg-v0178:STARTMODE=auto
/etc/sysconfig/network/ifcfg-v2080:STARTMODE=auto
/etc/sysconfig/network/ifcfg-vbr:STARTMODE=manual

That relies on v0001 and so on vlan interfaces (auto) triggering up-ping of vbr, that referencing lacp, that referencing eth0+eth1.

With the wicked update, that no longer worked. Setting STARTMODE=auto for all of the ifcfg files, again makes it work fine.

Not sure this is a bug, or intended behaviour change. But thought it best to report.
Comment 1 Marius Tomaschewski 2024-05-23 09:23:21 UTC
Thanks for the report!

Generally, interface configs with STARTMODE=manual aren't included
in "wicked ifup all" (network.service start), but need a dedicated
"wicked ifup <name>".

Further, interfaces STARTMODE=off should be ignores by "ifup".

While a vlan requires/pulls it's underlying "lower" interface:

   eth0 <--lower-- vlan

(v0001 requires/pulls lower vbr when I see correctly above).
In fact, the lower is required already to create the vlan, but not
another way around: starting the underlying/lower device does not
start all vlans on top (vbr does not have reference to it's vlans).

The bridge or bonding port relation is different and points into
an another direction:

  eth1 --master--> bond0 (or bridge)
  eth2 --master----^

Except of the requirement, that a port requires a started master
and pulls it into the "up" set, we also make an inverted "pull"
of the ports by master as it needs it's ports to find carrier.


Together, the trees are like e.g.:

  eth1 --master--> bond0 --master--> bridge <--lower-- vlanX
  eth2 --master----^                      ^----lower-- vlanX

Combined with STARTMODE=auto and STARTMODE=off, it's a challenge
to implement it and it was broken (mostly ignored) in the past.
It could be, that the current implementation is not 100% correct yet.

There are new utility options:

wicked <ifup|ifdown|ifreload> --dry-run <ifnames…|all>

show a config/system tree with interfaces marked with "+" and
"-" in the front.

We will review it again and look.

Please provide the "wicked ifup --dry-run all" output of the case
that don't work for you and where you'd expect it should work.
Comment 3 Patrick Schaaf 2024-05-23 09:36:55 UTC
(In reply to Marius Tomaschewski from comment #1)

Thank you for looking into this, and all the explanation. What you write, matches my understanding / expectation. Note that I do not use STARTMODE=off but (on the "underlings") STARTMODE=manual.

Here is a bit more info about my actual configuration, this time from a VM without the lacp bond in between, so a bit simpler (and still seeing same change)

awe:~ # egrep -H '(STARTMODE|BRIDGE_PORTS|ETHERDEVICE|BOOTPROTO|IPADDR)' /etc/sysconfig/network/ifcfg-*
/etc/sysconfig/network/ifcfg-eth0:STARTMODE=manual
/etc/sysconfig/network/ifcfg-eth0:BOOTPROTO=none
/etc/sysconfig/network/ifcfg-lo:IPADDR=127.0.0.1/8
/etc/sysconfig/network/ifcfg-lo:STARTMODE=nfsroot
/etc/sysconfig/network/ifcfg-lo:BOOTPROTO=static
/etc/sysconfig/network/ifcfg-v178:STARTMODE=auto
/etc/sysconfig/network/ifcfg-v178:BOOTPROTO=static
/etc/sysconfig/network/ifcfg-v178:IPADDR0=192.168.178.249/24
/etc/sysconfig/network/ifcfg-v178:ETHERDEVICE=vbr
/etc/sysconfig/network/ifcfg-v2082:STARTMODE=auto
/etc/sysconfig/network/ifcfg-v2082:BOOTPROTO=static
/etc/sysconfig/network/ifcfg-v2082:IPADDR0=172.20.82.249/24
/etc/sysconfig/network/ifcfg-v2082:ETHERDEVICE=vbr
/etc/sysconfig/network/ifcfg-vbr:STARTMODE=manual
/etc/sysconfig/network/ifcfg-vbr:BOOTPROTO=none
/etc/sysconfig/network/ifcfg-vbr:BRIDGE_PORTS=eth0

And here is the output you requested:

awe:~ # wicked ifup --dry-run all
wicked: System interface hierarchy structure:
wicked:   lo[1]
wicked:   bond0[2]
wicked:   eth0[3]
wicked: Config interface hierarchy structure:
wicked: + lo [1]
wicked:   bond0 [2]
wicked:   v178
wicked:    +-- vbr
wicked:         *-- eth0 [3]
wicked:   vbr
wicked:    *-- eth0 [3]
wicked:   v2082
wicked:    +-- vbr
wicked:         *-- eth0 [3]

Interface level result after booting:

awe:~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 02:7b:a9:9b:f5:4a brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether da:b0:fe:00:b1:01 brd ff:ff:ff:ff:ff:ff
    altname enp0s3

I'll happily adapt my generator to just have STARTMODE=auto for all the interfaces, if that's the better approach. Really mostly reporting this because it is a behaviour change + might surprise others (or maybe is interesting for you development wise.)
Comment 4 Marius Tomaschewski 2024-05-23 09:59:40 UTC
(In reply to Patrick Schaaf from comment #0)
>I'm used to (and thought at some point in the past had to) have STARTMODE
> configured like this:
[...]

Further questions:
Do you remember why? Was it some kind of workaround?

In your case, I'd understand you config as that you apparently didn't wanted
to start interfaces at boot time: You've explicitly removed the underlying
interfaces from the start at boot ("auto"="boot"="onboot" aka from "all" set)
and marked that you want to start them manually later.
As the vlans require a "disabled" aka "manual" interface, we IMO shouldn't
start it in "ifup all" / at boot as well.

Generally, bond (or team) ports should use STARTMODE=hotplug, so we don't
wait for eth0 or eth1 to appear both: 1 of them is sufficient (min_slaves=1
is default setting for bond) to start the bond.
When one is missing and the available port does not have carrier yet, the
bond driver in the kernel will handle it and inherit carrier when the 2nd
one arrives and provides the carrier -- wicked will add (hotplug) it to
the bond once it appears and wait for the bond, not for the ports.

With bridge ports it depends: when you bridge multiple segments and want to
ensure ifup waits/defers until the port appears, set port STARTMODE=auto and
when you "don't care about a port", e.g. dhcp & co is provided by the another
"auto" one and it's just a link to an "extension switch", STARTMODE=hotplug.
Comment 5 Marius Tomaschewski 2024-05-23 10:05:14 UTC
(In reply to Patrick Schaaf from comment #3)
> (In reply to Marius Tomaschewski from comment #1)
> 
> Thank you for looking into this, and all the explanation. What you write,
> matches my understanding / expectation. Note that I do not use STARTMODE=off
> but (on the "underlings") STARTMODE=manual.

Yes. While STARTMODE=off disables an interface in any ifup, STARTMODE=manual
disables to start it in the "wicked ifup all" call (made by network.service)
at boot aka needs manual "wicked ifup <name1>" call.

See also ifcfg(5) man page:
```
STARTMODE {manual*|auto|nfsroot|hotplug|off}
       Choose when the interface should be set up.
       manual 
              Interface will be set up if ifup is called manually
       auto   
              Interface will be set up as soon as it is available (and service  net-
              work  was  started).  This either happens at boot time when network is
              starting or via hotplug when a interface is added to  the  system  (by
              adding a device or loading a driver). To be backward compliant onboot,
              on and boot are aliases for auto.
       hotplug
              Interface will be activated when it is available. Use instead of  auto
              for  devices which may be missed, such as bonding slaves, usb or other
              plugable hardware.
       nfsroot
              Nearly like auto, but interfaces with this startmode will be not  shut
              down  by  default.   Use  this mode when you use a root filesystem via
              network or want to avoid interface shutdown. To force a nfsroot inter- 
              face down, use either wicked ifdown --force device-down <interface> or
              ifdown <interface> -o force. 
       off    
              Will never be activated.
```
Comment 6 Marius Tomaschewski 2024-05-23 10:45:09 UTC
Let's try it out -- with a simple config first:

epyc1:/etc/sysconfig/network # cat ifcfg-dummy0
STARTMODE=manual
epyc1:/etc/sysconfig/network # cat ifcfg-vlan10
STARMODE=auto
ETHERDEVICE=dummy0
VLAN_ID=10
epyc1:/etc/sysconfig/network # wicked ifup --dry-run  all
[…]
wicked: Config interface hierarchy structure:
wicked: + lo [1]
wicked:   vlan10
wicked:    +-- dummy0
wicked:   dummy0

== dummy0 is manual and skipped + vlan10 requires it -> skip both

epyc1:/etc/sysconfig/network # wicked ifup --dry-run dummy0
[…]
wicked:   lo [1]
wicked:   vlan10
wicked: +  +-- dummy0
wicked: + dummy0

== dummy0 requested manually -> start it, but not it's vlan interface on top

epyc1:/etc/sysconfig/network # wicked ifup --dry-run vlan10
[…]
wicked:   lo [1]
wicked: + vlan10
wicked: +  +-- dummy0
wicked: + dummy0

== vlan10 requested manually + requires dummy0 with STARTMODE=manual,
   but because this is a manual / explicit request, STARTMODE=manual
   is not excluded -> both are started

Going to make the same with your config (but bond0 as bridge port and
using 2 interfaces in comment #0 with ifcfg-lacp)...
Comment 7 Patrick Schaaf 2024-05-23 11:29:07 UTC
(In reply to Marius Tomaschewski from comment #6)

> == dummy0 is manual and skipped + vlan10 requires it -> skip both

If that is the intended behaviour (and recent change was just about actually making it so, previously it only accidentally brought both up) - then fine with me + I will adapt my ways!

(no need from my pov for you to try and recreate my more complicated setup)

Thank you again for the answers, they give me the direction I need. I'd be happy for the ticket to be closed.
Comment 8 Marius Tomaschewski 2024-05-23 13:30:07 UTC
(In reply to Patrick Schaaf from comment #7)
> (In reply to Marius Tomaschewski from comment #6)
> 
> > == dummy0 is manual and skipped + vlan10 requires it -> skip both
> 
> If that is the intended behaviour (and recent change was just about actually
> making it so, previously it only accidentally brought both up) - then fine
> with me + I will adapt my ways!

OK. Yes, previously we were not considering STARTMODE=manual and off properly.

> (no need from my pov for you to try and recreate my more complicated setup)
> 
> Thank you again for the answers, they give me the direction I need. I'd be
> happy for the ticket to be closed.

OK, thanks for the report anyway! Closing as "problem described is not a bug."