Bug 1212627 - NetworkManager: network interface not configured before SSH daemon is started
Summary: NetworkManager: network interface not configured before SSH daemon is started
Status: NEW
Alias: None
Product: openSUSE Leap Micro
Classification: openSUSE
Component: Base (show other bugs)
Version: 5.4
Hardware: Other Other
: P5 - None : Normal
Target Milestone: ---
Assignee: Jonathan Kang
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-22 14:55 UTC by Adam Majer
Modified: 2023-07-03 12:31 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
sshd can't bind to ip address (18.43 KB, application/gzip)
2023-06-28 12:19 UTC, Adam Majer
Details
fails after NetworkManager (18.17 KB, application/gzip)
2023-06-28 12:27 UTC, Adam Majer
Details
trace level log (44.59 KB, application/gzip)
2023-06-29 13:10 UTC, Adam Majer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Majer 2023-06-22 14:55:24 UTC
Basically, SSH is not waiting for network to be setup before starting resulting in SSH that is not brought on boot.


Jun 22 14:18:59 localhost sshd[845]: error: Bind to port 22 on 2a01:2f8:2012:9cdf:0:aead:1:aead failed: Cannot assign requested address.
Jun 22 14:18:59 localhost systemd[1]: sshd.service: Main process exited, code=exited, status=255/EXCEPTION
Jun 22 14:18:59 localhost sshd[845]: fatal: Cannot bind any address.
Jun 22 14:18:59 localhost systemd[1]: sshd.service: Failed with result 'exit-code'.
Jun 22 14:18:59 localhost systemd[1]: Failed to start OpenSSH Daemon.


....

Jun 22 14:18:59 localhost.localdomain NetworkManager[741]: <info>  [1687443539.8178] settings: (eth1): created default wired connection 'Wired connection 2'


It seems that NetworkManager-wait-online.service is triggered after everything is configured, but SSH startup is prior to this.
Comment 1 Adam Majer 2023-06-22 19:18:30 UTC
Workaround was to add

# /etc/systemd/system/sshd.service.d/01-after-network.conf
[Unit]
After=NetworkManager-wait-online.service

[Service]
Restart=always
RestartSec=5s
RestartPreventExitStatus=


The changes to Unit were insufficient as sshd was still not able to bind to specific IP address assigned by NetworkManager

Override of Restart* allowed the process to actually start.
Comment 3 Adam Majer 2023-06-23 07:29:39 UTC
NetworkManager was configured using `nmcli` and persisted.
Comment 4 Jonathan Kang 2023-06-28 06:42:55 UTC
Thanks for the bug report, Adam. Can you attach the output of `journalctl -b` on your system?
Comment 5 Adam Majer 2023-06-28 12:19:28 UTC
Created attachment 867858 [details]
sshd can't bind to ip address

Attached. You see in the journal that systemd started later, but that only happens since I modified sshd service file to always try to restart


systemctl cat sshd
# /usr/lib/systemd/system/sshd.service
[Unit]
Description=OpenSSH Daemon
After=network.target

[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/ssh
ExecStartPre=/usr/sbin/sshd-gen-keys-start
ExecStartPre=/usr/sbin/sshd -t $SSHD_OPTS
ExecStart=/usr/sbin/sshd -D $SSHD_OPTS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartPreventExitStatus=255
TasksMax=infinity

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/sshd.service.d/01-after-network.conf
#[Unit]
#After=NetworkManager-wait-online.service

[Service]
Restart=always
RestartSec=5s
RestartPreventExitStatus=
Comment 6 Adam Majer 2023-06-28 12:27:58 UTC
Created attachment 867859 [details]
fails after NetworkManager

With additional changes,

# /etc/systemd/system/sshd.service.d/01-after-network.conf
[Unit]
After=NetworkManager-wait-online.service

[Service]
Restart=always
RestartSec=5s
RestartPreventExitStatus=

SSHD comes up *most of the time* while still failing sometimes. That is, Network Maanger signals via NetworkManager-wait-online.service that it is complete, but the IP address is still not available for bind by SSHD. I've attached a log that shows exactly this.
Comment 7 Jonathan Kang 2023-06-29 01:40:43 UTC
(In reply to Adam Majer from comment #6)
> Created attachment 867859 [details]
> fails after NetworkManager
> 
> With additional changes,
> 
> # /etc/systemd/system/sshd.service.d/01-after-network.conf
> [Unit]
> After=NetworkManager-wait-online.service
> 
> [Service]
> Restart=always
> RestartSec=5s
> RestartPreventExitStatus=
> 
> SSHD comes up *most of the time* while still failing sometimes. That is,
> Network Maanger signals via NetworkManager-wait-online.service that it is
> complete, but the IP address is still not available for bind by SSHD. I've
> attached a log that shows exactly this.

> sshd[1225]: error: Bind to port 22 on 2a01:4f8:c012:9cdf:0:aead:1:aead failed: Cannot assign requested address

sshd waits for an ipv6 address, is this intentional? In this logs you provided, when sshd starts, there is already an ipv4 address available. But there is no ipv6 address assigned yet.
Comment 8 Adam Majer 2023-06-29 07:23:23 UTC
(In reply to Jonathan Kang from comment #7)
> (In reply to Adam Majer from comment #6)
> > Created attachment 867859 [details]
> > fails after NetworkManager
> > 
> > With additional changes,
> > 
> > # /etc/systemd/system/sshd.service.d/01-after-network.conf
> > [Unit]
> > After=NetworkManager-wait-online.service
> > 
> > [Service]
> > Restart=always
> > RestartSec=5s
> > RestartPreventExitStatus=
> > 
> > SSHD comes up *most of the time* while still failing sometimes. That is,
> > Network Maanger signals via NetworkManager-wait-online.service that it is
> > complete, but the IP address is still not available for bind by SSHD. I've
> > attached a log that shows exactly this.
> 
> > sshd[1225]: error: Bind to port 22 on 2a01:4f8:c012:9cdf:0:aead:1:aead failed: Cannot assign requested address
> 
> sshd waits for an ipv6 address, is this intentional? In this logs you
> provided, when sshd starts, there is already an ipv4 address available. But
> there is no ipv6 address assigned yet.

The IPv4 address is non-routable and configured via DHCP, and the only public address available is the IPv6. It's configured manually, so it's not like there is a long wait for RA or something,


ipv6.method:                            manual
ipv6.dns:                               2a01:4ff:ff00::add:1,2a01:4ff:ff00::add:2
ipv6.dns-search:                        --
ipv6.dns-options:                       --
ipv6.dns-priority:                      0
ipv6.addresses:                         2a01:4f8:c012:9cdf::1/64, 2a01:4f8:c012:9cdf:0:aead:1:aead/64


ipv4.method:                            auto
ipv4.dns:                               --
ipv4.dns-search:                        --
ipv4.dns-options:                       --
ipv4.dns-priority:                      0
ipv4.addresses:                         --


In the "old days", all network interfaces should be configured as much as possible before the rest of the system is signaled that network is ready. So, passive configuration via RA for IPv6 would be out-of-scope for this wait but everything else should be blocking. Static IP assignments should definitely be all assigned prior to signaling network setup is successful.

Basically, I would expect all configured network interfaces to be up and configured completely before proceeding. It's not OR condition here but AND. Optionally things could proceed after some timeout, like 2 or 3 minutes, if we are waiting for external resources that may be failing (eg. DHCP)

Finally, an answer here that "it would work if you just listen on wildcard address" is not really a good answer here.
Comment 9 Jonathan Kang 2023-06-29 12:55:43 UTC
(In reply to Adam Majer from comment #8)
> 
> The IPv4 address is non-routable and configured via DHCP, and the only
> public address available is the IPv6. It's configured manually, so it's not
> like there is a long wait for RA or something,
> 
I'm guessing maybe the network link is not yet ready when sshd is started. This needs detailed NetworkManager logs. To collect them, you need add the following lines in /etc/NetworkManager/NetworkManager.conf.

> [logging]
> level=TRACE

> 
> 
> In the "old days", all network interfaces should be configured as much as
> possible before the rest of the system is signaled that network is ready.
> So, passive configuration via RA for IPv6 would be out-of-scope for this
> wait but everything else should be blocking. Static IP assignments should
> definitely be all assigned prior to signaling network setup is successful.
> 
> Basically, I would expect all configured network interfaces to be up and
> configured completely before proceeding. It's not OR condition here but AND.
> Optionally things could proceed after some timeout, like 2 or 3 minutes, if
> we are waiting for external resources that may be failing (eg. DHCP)
> 
> Finally, an answer here that "it would work if you just listen on wildcard
> address" is not really a good answer here.
Comment 10 Adam Majer 2023-06-29 13:10:19 UTC
Created attachment 867884 [details]
trace level log

I've attached trace level log now
Comment 11 Jonathan Kang 2023-06-30 13:25:16 UTC
In your setup, NetworkManager logs "startup complete" when ipv4 succeeds, and this triggers NetworkManager-wait-online to quit. But at that moment, NetworkManager is still at the process of setting manual ipv6 address. This is caused by the "ipv6.may-fail" property defaulting to TRUE. So when dhcp4 succeeded, NetworkManager thought it's time to finish the startup.

Can you try setting "ipv6.may-fail" to FALSE and see if this makes any difference?

> nmcli connection modify Wired\ Connection\ 1 ipv6.may-fail no
Comment 12 Adam Majer 2023-06-30 15:00:34 UTC
(In reply to Jonathan Kang from comment #11)
> In your setup, NetworkManager logs "startup complete" when ipv4 succeeds,
> and this triggers NetworkManager-wait-online to quit. But at that moment,
> NetworkManager is still at the process of setting manual ipv6 address. This
> is caused by the "ipv6.may-fail" property defaulting to TRUE. So when dhcp4
> succeeded, NetworkManager thought it's time to finish the startup.
> 
> Can you try setting "ipv6.may-fail" to FALSE and see if this makes any
> difference?
> 
> > nmcli connection modify Wired\ Connection\ 1 ipv6.may-fail no

setting may-fail to false, seem to fix my problem. Thanks!

The question is then, shouldn't manually defined settings automatically default to may-fail=no, if there is no specific value set in this field?

I assume this probably should be discussed with upstream when may-fail should default to false and when to true. The current case when it always defaults to true is ... confusing.
Comment 13 Jonathan Kang 2023-07-03 12:31:02 UTC
(In reply to Adam Majer from comment #12)
> 
> setting may-fail to false, seem to fix my problem. Thanks!
> 
> The question is then, shouldn't manually defined settings automatically
> default to may-fail=no, if there is no specific value set in this field?

By default, NetworkManager set this property for each new connection to TRUE. This is a safe default for all kinds of connections. Setting this to FALSE may cause  infinite waiting for scenarios where users don't have an IPv6 address.
> 
> I assume this probably should be discussed with upstream when may-fail
> should default to false and when to true. The current case when it always
> defaults to true is ... confusing.

I think this is more of a problem about NetworkManager deciding it's time to quit NetworkManager-wait-online when it got an IPv4 address(without internet access), but at that time the manual IPv6 address is still in the process of pending/configuring.

It sure is a good idea to talk to upstream developers and see what they think about this issue.