Bug 1173402 - Kernel delays boot by 12s if ip= option given
Kernel delays boot by 12s if ip= option given
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-06-26 11:19 UTC by Fabian Vogt
Modified: 2022-07-06 16:25 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Log with rd.neednet=1 rd.debug (501.50 KB, text/plain)
2020-07-02 11:13 UTC, Fabian Vogt
Details
Log with rd.neednet=1 rd.debug ip=dhcp (103.52 KB, text/plain)
2020-07-02 11:15 UTC, Fabian Vogt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Vogt 2020-06-26 11:19:42 UTC
For networking in the initrd, the "ip=dhcp" (or similiar) option can be set in the kernel cmdline. This is parsed by dracut and used to configure wicked.
However, this option is also parsed by the kernel in its ip autoconfig code, which then ends up trying to bring up the network itself. This bringup includes a 12s delay to wait for interfaces to appear: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/ipconfig.c?id=3eb30c51a6dda26d0c5b8824b7c0515502f1c161#n1469
At this point the initrd isn't even mounted, so there are no kernel modules available and no interfaces appear.

@kernel maintainers: Is it maybe possible to not do autoconfig in this circumstance? It might be possible to check whether there are any network  drivers at all (the default kernel seems to have those as modules only) or whether it's booting an initrd. Though in the latter case I guess it's possible that some initrd out there relies on the kernel's successful autoconfig?

@dracut maintainers: It seems like "rd.neednet=1" is enough to get dracut to acquire an address over DHCP, so a workaround is to just drop "ip=dhcp". I'd like to have confirmation that this is intended behaviour and won't break without notice in the future. This doesn't help for cases without DHCP though.

If it's not possible to avoid the "ip=foo" induced kernel delay, it might be a good idea to provide "rd.ip=foo" or similar as alias, which is ignored by the kernel.
Comment 1 Fabian Vogt 2020-07-01 13:08:41 UTC
> @dracut maintainers: ...

Adding needinfo
Comment 2 Thomas Blume 2020-07-01 16:17:18 UTC
(In reply to Fabian Vogt from comment #0)
> @dracut maintainers: It seems like "rd.neednet=1" is enough to get dracut to
> acquire an address over DHCP, so a workaround is to just drop "ip=dhcp". I'd
> like to have confirmation that this is intended behaviour and won't break
> without notice in the future. This doesn't help for cases without DHCP
> though.
>
> If it's not possible to avoid the "ip=foo" induced kernel delay, it might be
> a good idea to provide "rd.ip=foo" or similar as alias, which is ignored by
> the kernel.

Need to double check but AFAIK, rd.neednet isn't sufficient, at least if the machine has multiple network interfaces. In this case you need to tell dracut which interface to use.
ip=dhcp works in this case, because it is a wildcard to try all network interfaces.
Comment 3 Fabian Vogt 2020-07-02 06:02:36 UTC
(In reply to Thomas Blume from comment #2)
> (In reply to Fabian Vogt from comment #0)
> > @dracut maintainers: It seems like "rd.neednet=1" is enough to get dracut to
> > acquire an address over DHCP, so a workaround is to just drop "ip=dhcp". I'd
> > like to have confirmation that this is intended behaviour and won't break
> > without notice in the future. This doesn't help for cases without DHCP
> > though.
> 
> Need to double check but AFAIK, rd.neednet isn't sufficient, at least if the
> machine has multiple network interfaces. In this case you need to tell
> dracut which interface to use.
> ip=dhcp works in this case, because it is a wildcard to try all network
> interfaces.

Dammit. I did some more tests meanwhile and it doesn't work in some cases,
presumably in those where there were no ifcfg-* files copied into the initrd.
It still acquires an IP over DHCP, but the dracut initqueue times out anyway.

Any idea what could be used as a workaround?
Comment 4 Thomas Blume 2020-07-02 09:17:33 UTC
(In reply to Fabian Vogt from comment #3)
> Dammit. I did some more tests meanwhile and it doesn't work in some cases,
> presumably in those where there were no ifcfg-* files copied into the initrd.
> It still acquires an IP over DHCP, but the dracut initqueue times out anyway.
> 
> Any idea what could be used as a workaround?

dhcp is the fallback, if there is no ip= config given.
Maybe it doesn't try all interfaces then.
But better I check the boot log.
Could you provide a log from a failing boot with the rd.debug boot parameter?
Comment 5 Fabian Vogt 2020-07-02 11:13:02 UTC
Created attachment 839297 [details]
Log with rd.neednet=1 rd.debug
Comment 6 Fabian Vogt 2020-07-02 11:15:23 UTC
Created attachment 839298 [details]
Log with rd.neednet=1 rd.debug ip=dhcp

Done. It can be seen that in both cases, DHCP is done successfully, but only with ip=dhcp, the did-setup flag file got created.
Comment 7 Thomas Blume 2020-07-02 15:14:53 UTC
Thanks, the culprit is in the ifup.sh script.
Because there is no ip= boot parameter, it goes to:

-->
# no ip option directed at our interface?
if [ -z "$NO_AUTO_DHCP" ] && [ ! -e /tmp/net.${netif}.up ]; then
    if [ -e /tmp/net.bootdev ]; then
        BOOTDEV=$(cat /tmp/net.bootdev)
        if [ "$netif" = "$BOOTDEV" ] || [ "$BOOTDEV" = "$(cat /sys/class/net/${netif}/address)" ]; then
            load_ipv6
            do_dhcp
        fi
    else
        if getargs 'ip=dhcp6'; then
            load_ipv6
            do_dhcp -6
        fi
        if getargs 'ip=dhcp'; then
            do_dhcp -4
        fi
    fi
--<

Comparing that to the setup loop when the ip= parameter is present, the call to the netroot script is missing:

-->
# Specific configuration, spin through the kernel command line
# looking for ip= lines
for p in $(getargs ip=); do
[...]
        # and finally, finish interface set up if there isn't already a script
        # to do so (which is the case in the dhcp path)
        if [ ! -e $hookdir/initqueue/setup_net_$netif.sh ]; then
            setup_net $netif
            source_hook initqueue/online $netif
            if [ -z "$manualup" ]; then
                /sbin/netroot $netif
            fi
        fi

        if command -v wicked >/dev/null && [ -z "$manualup" ]; then
            /sbin/netroot $netif
        fi

--<

It does setup the ip but not the netroot.
So, without the ip= parameter, dracut doesn't seems to recognize that the system root is provided via network.
According to the manpage, rd.neednet only commands dracut to set up the network, but not to provide netroot:

-->
       rd.neednet=1
           boolean, bring up network even without netroot set
--<

I guess you would have to add a netroot parameter to make dracut aware that system root is provided via network.
Maybe also root=dhcp works, though.
Comment 8 Fabian Vogt 2020-07-02 16:08:39 UTC
(In reply to Thomas Blume from comment #7)
> Thanks, the culprit is in the ifup.sh script.
> Because there is no ip= boot parameter, it goes to:
> 
> -->
> # no ip option directed at our interface?
> if [ -z "$NO_AUTO_DHCP" ] && [ ! -e /tmp/net.${netif}.up ]; then
>     if [ -e /tmp/net.bootdev ]; then
>         BOOTDEV=$(cat /tmp/net.bootdev)
>         if [ "$netif" = "$BOOTDEV" ] || [ "$BOOTDEV" = "$(cat
> /sys/class/net/${netif}/address)" ]; then
>             load_ipv6
>             do_dhcp
>         fi
>     else
>         if getargs 'ip=dhcp6'; then
>             load_ipv6
>             do_dhcp -6
>         fi
>         if getargs 'ip=dhcp'; then
>             do_dhcp -4
>         fi
>     fi
> --<


That only does something in the bootdev/ip=dhcp case though AFAICT, but a bit further up there is:

-->
# No ip lines default to dhcp
ip=$(getarg ip)

if [ -z "$NO_AUTO_DHCP" ] && [ -z "$ip" ]; then
    if [ "$netroot" = "dhcp6" ]; then
        do_dhcp -6
    else
        do_dhcp -4
    fi

    for s in $(getargs nameserver); do
        [ -n "$s" ] || continue
        echo nameserver $s >> /tmp/net.$netif.resolv.conf
    done
fi
--<

So this behaviour is a combination of:
- Interfaces without explicit ip= config fall back to DHCP
- It waits for (any) explicitly configured interface to be up
- ip=dhcp counts as "explicitly configuring all interfaces"

This might just need a '>/tmp/net.${netif}.did-setup' there?

As there is no ip parameter set, it can't possibly reach the code in
the 'for p in $(getargs ip=)' loop which creates the did-setup flag,
but net-genrules.sh generated the initqueue hook waiting for them
irregardless of any ip parameters.

> ...
> 
> It does setup the ip but not the netroot.
> So, without the ip= parameter, dracut doesn't seems to recognize that the
> system root is provided via network.
> According to the manpage, rd.neednet only commands dracut to set up the
> network, but not to provide netroot:
> 
> -->
>        rd.neednet=1
>            boolean, bring up network even without netroot set
> --<

That's actually what it's used for here - have network in the initrd, but boot from local disk.
It's because a dracut module (ignition) uses network for downloading files.

> I guess you would have to add a netroot parameter to make dracut aware that
> system root is provided via network.
> Maybe also root=dhcp works, though.

That might break booting... Even if that works as alternative to ip=dhcp without the delay, it doesn't seem like a good idea to use it instead, just the name alone would need a big disclaimer...
Comment 9 Thomas Blume 2020-07-03 07:06:14 UTC
(In reply to Fabian Vogt from comment #8)
> 
> That only does something in the bootdev/ip=dhcp case though AFAICT, but a
> bit further up there is:

Hm, but the code writing did-setup is outside the 'for p in $(getargs ip=)' loop.
The conditionals there are:

-->
# no ip option directed at our interface?
if [ -z "$NO_AUTO_DHCP" ] && [ ! -e /tmp/net.${netif}.up ]; then
--<

"$NO_AUTO_DHCP" is only set in special cases and /tmp/net.${netif}.up shouldn't have been written either since that is done in the for loop.
In this case it should go down to:


--<
    if [ $? -eq 0 ] && [ -z "$(ls /tmp/leaseinfo.${netif}*)" ]; then
         > /tmp/net.${netif}.did-setup
         if [ -e /sys/class/net/${netif}/address ]; then
             > /tmp/net.$(cat /sys/class/net/${netif}/address).did-setup
         fi
    fi
--<

> So this behaviour is a combination of:
> - Interfaces without explicit ip= config fall back to DHCP
> - It waits for (any) explicitly configured interface to be up
> - ip=dhcp counts as "explicitly configuring all interfaces"
> 
> This might just need a '>/tmp/net.${netif}.did-setup' there?

I already burned my fingers by writing did-setup too early in the code.
That might skip the setup of other configured interfaces or the setup of an dual ipv4/ipv6 stack.
I want to be very careful here.

> As there is no ip parameter set, it can't possibly reach the code in
> the 'for p in $(getargs ip=)' loop which creates the did-setup flag,
> but net-genrules.sh generated the initqueue hook waiting for them
> irregardless of any ip parameters.

Correct, the culprit is the missing did-setup, but I'm not sure yet why it doesn't get written. Do you see any other reason?

> > ...
> > 
> > It does setup the ip but not the netroot.
> > So, without the ip= parameter, dracut doesn't seems to recognize that the
> > system root is provided via network.
> > According to the manpage, rd.neednet only commands dracut to set up the
> > network, but not to provide netroot:
> > 
> > -->
> >        rd.neednet=1
> >            boolean, bring up network even without netroot set
> > --<
> 
> That's actually what it's used for here - have network in the initrd, but
> boot from local disk.
> It's because a dracut module (ignition) uses network for downloading files.
> 
> > I guess you would have to add a netroot parameter to make dracut aware that
> > system root is provided via network.
> > Maybe also root=dhcp works, though.
> 
> That might break booting... Even if that works as alternative to ip=dhcp
> without the delay, it doesn't seem like a good idea to use it instead, just
> the name alone would need a big disclaimer...

Ok, so I don't need to consider the netroot case.
Let's focus on the did-setup then.
Comment 10 Thomas Blume 2020-07-03 09:52:43 UTC
(In reply to Thomas Blume from comment #9)
> "$NO_AUTO_DHCP" is only set in special cases and /tmp/net.${netif}.up
> shouldn't have been written either since that is done in the for loop.
> In this case it should go down to:
> 
> 
> --<
>     if [ $? -eq 0 ] && [ -z "$(ls /tmp/leaseinfo.${netif}*)" ]; then
>          > /tmp/net.${netif}.did-setup
>          if [ -e /sys/class/net/${netif}/address ]; then
>              > /tmp/net.$(cat /sys/class/net/${netif}/address).did-setup
>          fi
>     fi
> --<

According to the logs it is the presence of the leaseinfo file that prevents the writing of did-setup:

-->
[   10.385337] dracut-initqueue[419]: ++ ls /tmp/leaseinfo.ens3.dhcp.ipv4
[   10.385969] dracut-initqueue[364]: + '[' -z /tmp/leaseinfo.ens3.dhcp.ipv4 ']'
[   10.386677] dracut-initqueue[364]: + exit 0
--<

leasinfo is created by wicked, so that looks correct:

-->
[    3.808884] dracut-initqueue[364]: + wicked test dhcp4 --format leaseinfo --output /tmp/leaseinfo.ens3.dhcp.ipv4 --request - ens3
[    3.810522] dracut-initqueue[404]: wicked: ens3: Request to acquire DHCPv4 lease with UUID bab7fd5e-1e18-0500-9401-000001000000
[   10.316896] dracut-initqueue[404]: wicked: ens3: Committed DHCPv4 lease with address 10.160.66.28 (lease time 1800 sec, renew in 900 sec, rebind in 1575 sec)
[   10.337314] dracut-initqueue[364]: + dhcp_wicked_apply -4
[   10.338395] dracut-initqueue[364]: + unset IPADDR INTERFACE BROADCAST NETWORK PREFIXLEN ROUTES GATEWAYS MTU HOSTNAME DNSDOMAIN DNSSEARCH DNSSERVERS
[   10.341019] dracut-initqueue[364]: + '[' -f /tmp/leaseinfo.ens3.dhcp.ipv4 ']'
[   10.342144] dracut-initqueue[364]: + . /tmp/leaseinfo.ens3.dhcp.ipv4
--<

which makes me wonder whether the -z in the test is correct.
Comment 11 Thomas Blume 2020-07-03 14:13:04 UTC
Can you please give the dracut package at:

https://build.opensuse.org/package/binaries/home:tsaupe:branches:openSUSE:Factory:dracut-bsc1173402/dracut/openSUSE_Factory

a try?
Comment 12 Fabian Vogt 2020-07-06 14:45:05 UTC
(In reply to Thomas Blume from comment #11)
> Can you please give the dracut package at:
> 
> https://build.opensuse.org/package/binaries/home:tsaupe:branches:openSUSE:
> Factory:dracut-bsc1173402/dracut/openSUSE_Factory
> 
> a try?

Works!

Looking at this again, this is quite the mess...
AFAICT, touching the did-setup file after a successful "do_dhcp" is generally the right way.

This is currently not done, which lead to https://github.com/tblume/dracut/commit/0b998beebc008f2dda2eed8a36d59f4a258cb27a adding another path to create did-setup.
This new path was broken by https://github.com/tblume/dracut/commit/28be2f3692d6a641c841f636c03e45c5f3c4ec6b, which you now fixed with https://github.com/tblume/dracut/commit/487fb167b82c88871e6d1ace21ccdb0a48e9032d.

What should be done IMO:

- Touch did-setup in the "# No ip lines default to dhcp" case after successful do_dhcp
- Touch did-setup in the "# no ip option directed at our interface?" case after successful do_dhcp

0b998beebc008f2dda2eed8a36d59f4a258cb27a introduced the code to handle the latter, but:
- It also runs in cases where the do_dhcp in this block wasn't actually run (like in this bug report)
- The $? it checks is meaningless at this point due to the ifs

In the "rd.neednet=1 bootdev=eth0" case it actually runs do_dhcp twice for the interface, as both
"# No ip ..." blocks apply. Maybe those blocks should be merged?
Comment 13 Thomas Blume 2020-07-07 12:41:38 UTC
(In reply to Fabian Vogt from comment #12)
> What should be done IMO:
> 
> - Touch did-setup in the "# No ip lines default to dhcp" case after
> successful do_dhcp
> - Touch did-setup in the "# no ip option directed at our interface?" case
> after successful do_dhcp
> 
> 0b998beebc008f2dda2eed8a36d59f4a258cb27a introduced the code to handle the
> latter, but:
> - It also runs in cases where the do_dhcp in this block wasn't actually run
> (like in this bug report)
> - The $? it checks is meaningless at this point due to the ifs
> 
> In the "rd.neednet=1 bootdev=eth0" case it actually runs do_dhcp twice for
> the interface, as both
> "# No ip ..." blocks apply. Maybe those blocks should be merged?

Ok, let me investigate that. 
The patch from bsc#1172807 also needs more investigation, because on dual stack setups, dracut must not skip the ipv6 setup when the ipv4 setup was successfull.
Will let you know as soon as I have something to try.
Comment 14 Thomas Blume 2020-07-20 12:37:55 UTC
(In reply to Fabian Vogt from comment #12)
> (In reply to Thomas Blume from comment #11)
> > Can you please give the dracut package at:
> > 
> > https://build.opensuse.org/package/binaries/home:tsaupe:branches:openSUSE:
> > Factory:dracut-bsc1173402/dracut/openSUSE_Factory
> > 
> > a try?
> 
> Works!
> 
> Looking at this again, this is quite the mess...
> AFAICT, touching the did-setup file after a successful "do_dhcp" is
> generally the right way.
> 
> This is currently not done, which lead to
> https://github.com/tblume/dracut/commit/
> 0b998beebc008f2dda2eed8a36d59f4a258cb27a adding another path to create
> did-setup.
> This new path was broken by
> https://github.com/tblume/dracut/commit/
> 28be2f3692d6a641c841f636c03e45c5f3c4ec6b, which you now fixed with
> https://github.com/tblume/dracut/commit/
> 487fb167b82c88871e6d1ace21ccdb0a48e9032d.
> 
> What should be done IMO:
> 
> - Touch did-setup in the "# No ip lines default to dhcp" case after
> successful do_dhcp

The code there really looks wrong:

-->
if [ -z "$NO_AUTO_DHCP" ] && [ -z "$ip" ]; then
    if [ "$netroot" = "dhcp6" ]; then
        do_dhcp -6
    else
        do_dhcp -4
    fi
--<

That runs do_dhcp -4 in any case if there is no ip= line and dhcp isn't explicitly disabled.
But that code is apparently for the netroot case, so it needs to be checked whether $netroot is filled.
After this, a did-setup is indeed appropriate IMHO.

> - Touch did-setup in the "# no ip option directed at our interface?" case
> after successful do_dhcp
> 
> 0b998beebc008f2dda2eed8a36d59f4a258cb27a introduced the code to handle the
> latter, but:
> - It also runs in cases where the do_dhcp in this block wasn't actually run
> (like in this bug report)
> - The $? it checks is meaningless at this point due to the ifs

Hm, did you check the latest code?
There is:

if [ $? -eq 0 ] && [ -n "$(ls /tmp/leaseinfo.${netif}*)" ]; then

the leasinfo file won't be there if there was no dhcp setup.

> In the "rd.neednet=1 bootdev=eth0" case it actually runs do_dhcp twice for
> the interface, as both
> "# No ip ..." blocks apply. Maybe those blocks should be merged?

Those parameters are a bit different, bootdev is documented as:
-->
bootdev=<interface>
           specify network interface to use routing and netroot information from. Required if multiple ip= lines are used.
--<

So it is bound to the netroot case and a specific interface, whereas rd.neednet does the network setup unconditionally.
Also, the code for evaluating rd.neednet isn't in ifup.sh but in net-genrules.sh.

Considering the duplicate run of do_dhcp, I'd blame this at the missing check for netroot as outlined above.
Comment 15 Fabian Vogt 2020-07-20 13:19:13 UTC
(In reply to Thomas Blume from comment #14)
> (In reply to Fabian Vogt from comment #12)
> > - Touch did-setup in the "# no ip option directed at our interface?" case
> > after successful do_dhcp
> > 
> > 0b998beebc008f2dda2eed8a36d59f4a258cb27a introduced the code to handle the
> > latter, but:
> > - It also runs in cases where the do_dhcp in this block wasn't actually run
> > (like in this bug report)
> > - The $? it checks is meaningless at this point due to the ifs
> 
> Hm, did you check the latest code?
> There is:
> 
> if [ $? -eq 0 ] && [ -n "$(ls /tmp/leaseinfo.${netif}*)" ]; then
> 
> the leasinfo file won't be there if there was no dhcp setup.

Yes, but the use of $? there is still wrong AFAICT.

> > In the "rd.neednet=1 bootdev=eth0" case it actually runs do_dhcp twice for
> > the interface, as both
> > "# No ip ..." blocks apply. Maybe those blocks should be merged?
> 
> Those parameters are a bit different, bootdev is documented as:
> -->
> bootdev=<interface>
>            specify network interface to use routing and netroot information
> from. Required if multiple ip= lines are used.
> --<
> 
> So it is bound to the netroot case and a specific interface, whereas
> rd.neednet does the network setup unconditionally.

That seems to be working with netroot, but not depend on netroot.
It doesn't say that bootdev can only be specified if multiple ip= options
are set, but only that it is mandatory in that case.

> Also, the code for evaluating rd.neednet isn't in ifup.sh but in
> net-genrules.sh.
> 
> Considering the duplicate run of do_dhcp, I'd blame this at the missing
> check for netroot as outlined above.

Not tested, but would that just move the "dhcp twice" case to netroot only?
There's nothing which prevents the "# no ip option directed at our interface?"
block from running after "# No ip lines default to dhcp" as the former doesn't
touch /tmp/net.${netif}.up, does it?
Comment 16 Thomas Blume 2020-07-20 14:52:20 UTC
(In reply to Fabian Vogt from comment #15)
> > Hm, did you check the latest code?
> > There is:
> > 
> > if [ $? -eq 0 ] && [ -n "$(ls /tmp/leaseinfo.${netif}*)" ]; then
> > 
> > the leasinfo file won't be there if there was no dhcp setup.
> 
> Yes, but the use of $? there is still wrong AFAICT.

Ah, you mean when there is no ip= option at all and no bootdev either?
But how should the network get set up via ifup then?
Using rd.neednet without specifying an interface would make the script bail out early:

-->
# Huh? No $1?
[ -z "$1" ] && exit 1
--<

There is the ifname boot parameter, but that is again covered in another script, e.g. parse-ifname.sh.

> > Those parameters are a bit different, bootdev is documented as:
> > -->
> > bootdev=<interface>
> >            specify network interface to use routing and netroot information
> > from. Required if multiple ip= lines are used.
> > --<
> > 
> > So it is bound to the netroot case and a specific interface, whereas
> > rd.neednet does the network setup unconditionally.
> 
> That seems to be working with netroot, but not depend on netroot.
> It doesn't say that bootdev can only be specified if multiple ip= options
> are set, but only that it is mandatory in that case.

Hm, right, even though it doesn't make sense it is allowed to set rd.neednet together with bootdev.
But in any case the bootdev option requires an specific interface name while in the rd.neednet only case $iface should be empty.

> Not tested, but would that just move the "dhcp twice" case to netroot only?
> There's nothing which prevents the "# no ip option directed at our
> interface?"
> block from running after "# No ip lines default to dhcp" as the former
> doesn't
> touch /tmp/net.${netif}.up, does it?

Maybe, I'm overlooking something but AFAICS, the second if would call do_dhcp only when bootdev is set or ip=dhcp(6);

-->
    if [ -e /tmp/net.bootdev ]; then
        BOOTDEV=$(cat /tmp/net.bootdev)
        if [ "$netif" = "$BOOTDEV" ] || [ "$BOOTDEV" = "$(cat /sys/class/net/${netif}/address)" ]; then
            load_ipv6
            do_dhcp
        fi
    else
        if getargs 'ip=dhcp6'; then
            load_ipv6
            do_dhcp -6
        fi
        if getargs 'ip=dhcp'; then
            do_dhcp -4
        fi
    fi
--<

So, it shouldn't run again after the "# No ip lines default to dhcp" part.
But it would write did-setup, which is fine since it could have been setup in "# No ip lines default to dhcp".

Please note I appreciate your comments and will try to addess them as best as possible, but without breaking anything working.
Unfortunately code changes here are very delicate since it has a big impact on netboot.
Comment 17 Thomas Blume 2020-07-20 15:26:19 UTC
(In reply to Thomas Blume from comment #16)
> (In reply to Fabian Vogt from comment #15)
> > > Hm, did you check the latest code?
> > > There is:
> > > 
> > > if [ $? -eq 0 ] && [ -n "$(ls /tmp/leaseinfo.${netif}*)" ]; then
> > > 
> > > the leasinfo file won't be there if there was no dhcp setup.
> > 
> > Yes, but the use of $? there is still wrong AFAICT.
> 
> Ah, you mean when there is no ip= option at all and no bootdev either?
> But how should the network get set up via ifup then?
> Using rd.neednet without specifying an interface would make the script bail
> out early:
> 
> -->
> # Huh? No $1?
> [ -z "$1" ] && exit 1
> --<
> 
> There is the ifname boot parameter, but that is again covered in another
> script, e.g. parse-ifname.sh.

Sorry, that reply was incomplete, $? is also needed in the do_dhcp case, because the function has parts where the return code is set explicitly.
checking for leaseinfo is not the same, because this file is created by wicked in yet another function (dhcp_wicked_apply).
Comment 18 Fabian Vogt 2020-07-23 11:43:03 UTC
(In reply to Thomas Blume from comment #16)
> (In reply to Fabian Vogt from comment #15)
> > > Hm, did you check the latest code?
> > > There is:
> > > 
> > > if [ $? -eq 0 ] && [ -n "$(ls /tmp/leaseinfo.${netif}*)" ]; then
> > > 
> > > the leasinfo file won't be there if there was no dhcp setup.
> > 
> > Yes, but the use of $? there is still wrong AFAICT.
> 
> Ah, you mean when there is no ip= option at all and no bootdev either?

No, that "$?" is wrong because there is an if preceding it.
Example:
if false; then false; fi
echo $? # 0

So using $? after an if is not working as expected here, it will only indicate
the exit code of "do_dhcp -4" or "do_dhcp", but never of "do_dhcp -6".

> But how should the network get set up via ifup then?
> Using rd.neednet without specifying an interface would make the script bail
> out early:
> 
> -->
> # Huh? No $1?
> [ -z "$1" ] && exit 1
> --<
> 
> There is the ifname boot parameter, but that is again covered in another
> script, e.g. parse-ifname.sh.

ifup.sh is called by udev for each interface, so $1 is always set.
Just "rd.neednet=1" works fine.

> > Not tested, but would that just move the "dhcp twice" case to netroot only?
> > There's nothing which prevents the "# no ip option directed at our
> > interface?"
> > block from running after "# No ip lines default to dhcp" as the former
> > doesn't
> > touch /tmp/net.${netif}.up, does it?
> 
> Maybe, I'm overlooking something but AFAICS, the second if would call
> do_dhcp only when bootdev is set or ip=dhcp(6);

Indeed, so that would probably work. It's totally not obvious though.

> -->
>     if [ -e /tmp/net.bootdev ]; then
>         BOOTDEV=$(cat /tmp/net.bootdev)
>         if [ "$netif" = "$BOOTDEV" ] || [ "$BOOTDEV" = "$(cat
> /sys/class/net/${netif}/address)" ]; then
>             load_ipv6
>             do_dhcp
>         fi
>     else
>         if getargs 'ip=dhcp6'; then
>             load_ipv6
>             do_dhcp -6
>         fi
>         if getargs 'ip=dhcp'; then
>             do_dhcp -4
>         fi
>     fi
> --<
> 
> So, it shouldn't run again after the "# No ip lines default to dhcp" part.
> But it would write did-setup, which is fine since it could have been setup
> in "# No ip lines default to dhcp".

Yay, spaghetti...

> Please note I appreciate your comments and will try to addess them as best
> as possible, but without breaking anything working.
> Unfortunately code changes here are very delicate since it has a big impact
> on netboot.

Yep, this is the reverse-engineering part of cleanup ;-)

I don't see any downside to merging the "# No ip lines default to dhcp" block
with the "# no ip option directed at our interface?" one down below, it would
make the various options and flows much more obvious.
Comment 19 Thomas Blume 2020-07-23 14:09:55 UTC
(In reply to Fabian Vogt from comment #18)
> No, that "$?" is wrong because there is an if preceding it.
> Example:
> if false; then false; fi
> echo $? # 0
> 
> So using $? after an if is not working as expected here, it will only
> indicate
> the exit code of "do_dhcp -4" or "do_dhcp", but never of "do_dhcp -6".

Argh, you are right, took me some time to see it, thanks!

[...]

> Yay, spaghetti...

fully agreed
 
> > Please note I appreciate your comments and will try to addess them as best
> > as possible, but without breaking anything working.
> > Unfortunately code changes here are very delicate since it has a big impact
> > on netboot.
> 
> Yep, this is the reverse-engineering part of cleanup ;-)
> 
> I don't see any downside to merging the "# No ip lines default to dhcp" block
> with the "# no ip option directed at our interface?" one down below, it would
> make the various options and flows much more obvious.

Ok, working on an update.
Comment 20 Thomas Blume 2020-07-31 08:31:17 UTC
I've created testpackages with a proposed patch at:

 https://build.opensuse.org/package/show/home:tsaupe:branches:openSUSE:Factory:dracut-bsc1173402/dracut

It worked well in my test setup, but it would be good if you could confirm.

Here is the commit for reference:

https://github.com/tblume/dracut/commit/2127271afdbf8ab669a86b382e75eb51a9965396
Comment 21 Fabian Vogt 2020-07-31 10:04:29 UTC
(In reply to Thomas Blume from comment #20)
> I've created testpackages with a proposed patch at:
> 
>  https://build.opensuse.org/package/show/home:tsaupe:branches:openSUSE:
> Factory:dracut-bsc1173402/dracut
> 
> It worked well in my test setup, but it would be good if you could confirm.

Confirmed to work with:
rd.neednet -> DHCPv4
rd.neednet ip=dhcp6 -> DHCPv6
rd.neednet bootdev=eth0 -> DHCPv4 (once)

> Here is the commit for reference:
> 
> https://github.com/tblume/dracut/commit/
> 2127271afdbf8ab669a86b382e75eb51a9965396

I'd actually go a step further: https://github.com/Vogtinator/dracut-1/commit/b39e4fe4d53a0806f82aa303690a7a7da9d09711

That also makes other differences (bugs?) more obvious:
- in the bootdev case, it calls load_ipv6, then do_dhcp. AFAICT do_dhcp is equivalent to do_dhcp -4 for wicked, but probably not dhclient?
- for s in $(getargs nameserver); do ... done is only done if ip is not set, but not in the ip=dhcp/dhcp6 case or the bootdev case
- load_ipv6 is missing before do_dhcp -6
Comment 22 Thomas Blume 2020-07-31 14:25:53 UTC
(In reply to Fabian Vogt from comment #21)
> (In reply to Thomas Blume from comment #20)
> > I've created testpackages with a proposed patch at:
> > 
> >  https://build.opensuse.org/package/show/home:tsaupe:branches:openSUSE:
> > Factory:dracut-bsc1173402/dracut
> > 
> > It worked well in my test setup, but it would be good if you could confirm.
> 
> Confirmed to work with:
> rd.neednet -> DHCPv4
> rd.neednet ip=dhcp6 -> DHCPv6
> rd.neednet bootdev=eth0 -> DHCPv4 (once)
> 
> > Here is the commit for reference:
> > 
> > https://github.com/tblume/dracut/commit/
> > 2127271afdbf8ab669a86b382e75eb51a9965396
> 
> I'd actually go a step further:
> https://github.com/Vogtinator/dracut-1/commit/
> b39e4fe4d53a0806f82aa303690a7a7da9d09711
> 
> That also makes other differences (bugs?) more obvious:
> - in the bootdev case, it calls load_ipv6, then do_dhcp. AFAICT do_dhcp is
> equivalent to do_dhcp -4 for wicked, but probably not dhclient?
> - for s in $(getargs nameserver); do ... done is only done if ip is not set,
> but not in the ip=dhcp/dhcp6 case or the bootdev case
> - load_ipv6 is missing before do_dhcp -6

Thanks a lot Fabian, I took the freedom to merge your commit and addressed the above issues.
Looking into the dhclient manpage I've found that it defaults to dhcp4, so it should behave just like wicked.

There was one little quirks in your patch, it didn't address the case when net.bootdev exists.
In that case do_dhcp (-4) would have been already called in the BOOTDEV branch.
So, I've added a check.

My updated patch is at:

https://github.com/tblume/dracut/commit/824f0308428d3e270a346ad69f38f70387304111

Would be nice if you could take another look and double check.
Comment 23 Fabian Vogt 2020-07-31 14:49:27 UTC
(In reply to Thomas Blume from comment #22)
> (In reply to Fabian Vogt from comment #21)
> > (In reply to Thomas Blume from comment #20)
> > > I've created testpackages with a proposed patch at:
> > > 
> > >  https://build.opensuse.org/package/show/home:tsaupe:branches:openSUSE:
> > > Factory:dracut-bsc1173402/dracut
> > > 
> > > It worked well in my test setup, but it would be good if you could confirm.
> > 
> > Confirmed to work with:
> > rd.neednet -> DHCPv4
> > rd.neednet ip=dhcp6 -> DHCPv6
> > rd.neednet bootdev=eth0 -> DHCPv4 (once)
> > 
> > > Here is the commit for reference:
> > > 
> > > https://github.com/tblume/dracut/commit/
> > > 2127271afdbf8ab669a86b382e75eb51a9965396
> > 
> > I'd actually go a step further:
> > https://github.com/Vogtinator/dracut-1/commit/
> > b39e4fe4d53a0806f82aa303690a7a7da9d09711
> > 
> > That also makes other differences (bugs?) more obvious:
> > - in the bootdev case, it calls load_ipv6, then do_dhcp. AFAICT do_dhcp is
> > equivalent to do_dhcp -4 for wicked, but probably not dhclient?
> > - for s in $(getargs nameserver); do ... done is only done if ip is not set,
> > but not in the ip=dhcp/dhcp6 case or the bootdev case
> > - load_ipv6 is missing before do_dhcp -6
> 
> Thanks a lot Fabian, I took the freedom to merge your commit and addressed
> the above issues.
> Looking into the dhclient manpage I've found that it defaults to dhcp4, so
> it should behave just like wicked.

Ok, then I don't understand why it calls load_ipv6...

> There was one little quirks in your patch, it didn't address the case when
> net.bootdev exists.
> In that case do_dhcp (-4) would have been already called in the BOOTDEV
> branch.
> So, I've added a check.

It's actually in the else branch of "if [ -e /tmp/net.bootdev ];", so that
check is AFAICT redundant and can never trigger.

> My updated patch is at:
> 
> https://github.com/tblume/dracut/commit/
> 824f0308428d3e270a346ad69f38f70387304111
> 
> Would be nice if you could take another look and double check.

LGTM, except for the redundant check, but I didn't actually test it (yet).
Comment 24 Thomas Blume 2020-07-31 15:18:43 UTC
(In reply to Fabian Vogt from comment #23)
> > Thanks a lot Fabian, I took the freedom to merge your commit and addressed
> > the above issues.
> > Looking into the dhclient manpage I've found that it defaults to dhcp4, so
> > it should behave just like wicked.
> 
> Ok, then I don't understand why it calls load_ipv6...

Yes that apparently wasn't necessary.

> > There was one little quirks in your patch, it didn't address the case when
> > net.bootdev exists.
> > In that case do_dhcp (-4) would have been already called in the BOOTDEV
> > branch.
> > So, I've added a check.
> 
> It's actually in the else branch of "if [ -e /tmp/net.bootdev ];", so that
> check is AFAICT redundant and can never trigger.

Indeed, here is the update:

https://github.com/tblume/dracut/commit/19e6da525e05b95cccdcaf51249d4d06da5b992f

> > Would be nice if you could take another look and double check.
> 
> LGTM, except for the redundant check, but I didn't actually test it (yet).

Will provide some testpackages.
Comment 25 Thomas Blume 2020-08-03 10:07:44 UTC
(In reply to Thomas Blume from comment #24)
> Indeed, here is the update:
> 
> https://github.com/tblume/dracut/commit/
> 19e6da525e05b95cccdcaf51249d4d06da5b992f
> 
> > > Would be nice if you could take another look and double check.
> > 
> > LGTM, except for the redundant check, but I didn't actually test it (yet).
> 
> Will provide some testpackages.

Testpackages at: 

https://build.opensuse.org/package/show/home:tsaupe:branches:openSUSE:Factory:dracut-bsc1173402/dracut

they worked well in my test setup.
Comment 26 Fabian Vogt 2020-08-03 12:08:58 UTC
(In reply to Thomas Blume from comment #25)
> (In reply to Thomas Blume from comment #24)
> > Indeed, here is the update:
> > 
> > https://github.com/tblume/dracut/commit/
> > 19e6da525e05b95cccdcaf51249d4d06da5b992f
> > 
> > > > Would be nice if you could take another look and double check.
> > > 
> > > LGTM, except for the redundant check, but I didn't actually test it (yet).
> > 
> > Will provide some testpackages.
> 
> Testpackages at: 
> 
> https://build.opensuse.org/package/show/home:tsaupe:branches:openSUSE:
> Factory:dracut-bsc1173402/dracut
> 
> they worked well in my test setup.

Working for me as expected as well.
Comment 27 Thomas Blume 2020-08-06 10:00:03 UTC
(In reply to Fabian Vogt from comment #26)
> > they worked well in my test setup.
> 
> Working for me as expected as well.

Thanks!
Patch was merged upstream and will be added to one of the next dracut updates for Tumbleweed.
Comment 28 Fabian Vogt 2020-08-19 13:34:52 UTC
(In reply to Thomas Blume from comment #27)
> (In reply to Fabian Vogt from comment #26)
> > > they worked well in my test setup.
> > 
> > Working for me as expected as well.
> 
> Thanks!
> Patch was merged upstream and will be added to one of the next dracut
> updates for Tumbleweed.

Great, that leaves the kernel side (see the initial description).
Comment 29 Miroslav Beneš 2021-12-31 12:29:04 UTC
It fell through the cracks. Fabian, is this still a problem and the kernel needs a change?
Comment 30 Fabian Vogt 2022-01-10 15:56:31 UTC
(In reply to Miroslav Beneš from comment #29)
> It fell through the cracks. Fabian, is this still a problem and the kernel
> needs a change?

It looks like this was fixed with https://github.com/openSUSE/kernel-source/commit/7e286c21568afbac1e9d3de493cc598312bfc248, but without referencing this report.