Bug 1116767 - Cloud:Tools/cloud-init: Bug network not coming up
Cloud:Tools/cloud-init: Bug network not coming up
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE.org
Classification: openSUSE
Component: 3rd party software
unspecified
Other Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: Robert Schweikert
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-11-20 18:37 UTC by Jon Brightwell
Modified: 2019-08-21 19:47 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
screenshot of eth0 down (17.13 KB, image/png)
2018-11-20 18:37 UTC, Jon Brightwell
Details
second screenshot of where we see a ping before it stops (23.11 KB, image/png)
2018-11-20 19:05 UTC, Jon Brightwell
Details
cloud-init.log (100.05 KB, text/x-log)
2018-12-04 15:43 UTC, Jon Brightwell
Details
cloud-init.log from a "working" server (91.51 KB, text/plain)
2018-12-04 15:55 UTC, Jon Brightwell
Details
cloud init output 18.5 with patch (3.13 KB, text/x-log)
2019-01-17 09:57 UTC, Jon Brightwell
Details
cloudinit.log 18.5 with patch (96.49 KB, text/x-log)
2019-01-17 09:58 UTC, Jon Brightwell
Details
cloud-init.log v18.5 with dhcp-client (106.52 KB, text/x-log)
2019-01-17 14:41 UTC, Jon Brightwell
Details
cloud-init.log v18.5 with dhcp-client from proper repo (107.07 KB, text/x-log)
2019-01-17 23:37 UTC, Jon Brightwell
Details
cloud-init.log 18.5 route race (107.10 KB, text/x-log)
2019-01-18 10:36 UTC, Jon Brightwell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jon Brightwell 2018-11-20 18:37:37 UTC
Created attachment 790345 [details]
screenshot of eth0 down

Trying to diagnose an issue with a L15 openstack image we're building and I can't find any reason why this has suddenly stopped working apart from (possibly) the cloud-init update.


If I set off a ping on a building instance, it responds 2 packets before never working again.
$ ping 51.68.87.137
PING 51.68.87.137 (51.68.87.137) 56(84) bytes of data.
64 bytes from 51.68.87.137: icmp_seq=122 ttl=46 time=670 ms
64 bytes from 51.68.87.137: icmp_seq=123 ttl=46 time=34.1 ms

This is reproducible.

$ ping 51.68.87.137
PING 51.68.87.137 (51.68.87.137) 56(84) bytes of data.
64 bytes from 51.68.87.137: icmp_seq=18 ttl=46 time=607 ms
64 bytes from 51.68.87.137: icmp_seq=19 ttl=46 time=31.5 ms

I can't be 100% sure of the timing (vnc lag) but it seems to respond to pings shortly after wicked and dhcp is started (approx 8-9s in) but then dies shortly with the cloud-init meta data crawler - where it shows "ci-info eth0 up false" in an ascii table showing the NICs.
Comment 1 Jon Brightwell 2018-11-20 18:39:26 UTC
Should have mentioned, it's practically a stock L15 openstack image https://build.opensuse.org/package/rdiff/home:zippy:jx:images/kiwi-templates-Leap15-JeOS?opackage=kiwi-templates-Leap15-JeOS&oproject=openSUSE%3ALeap%3A15.0%3AImages&rev=5

We need at least CI 18.4 to get it setup on OVH (hosting provider).
Comment 2 Jon Brightwell 2018-11-20 19:05:59 UTC
Created attachment 790347 [details]
second screenshot of where we see a ping before it stops

one ping only. Somewhere between purge kernels (after wicked startup) and ci info.
Comment 3 Jon Brightwell 2018-11-21 14:19:29 UTC
Seeing the same result in cloud:tools:next version. Last working version of :next was rev 39 + my version of the "Add 0001-Follow-the-ever-bouncing-ball-for-openSUSE-distribut.patch" https://github.com/moozaad/cloud-init/commit/51ac838dde22d45e954199e20c0959af01a792eb
Comment 4 Robert Schweikert 2018-12-04 13:58:49 UTC
Well, we'll need at least the cloud-init log.
Comment 5 Robert Schweikert 2018-12-04 13:59:25 UTC
Also probably want your cloud.cfg file
Comment 6 Jon Brightwell 2018-12-04 14:13:50 UTC
I knew you were going to ask that :) 

Is there a simple way to set a root password in hosted openstack (metadata?) or kiwi so I can get in via the VNC console?
Comment 7 Robert Schweikert 2018-12-04 14:34:14 UTC
Well in kiwi you can create a user with the <user> directive.

But it is probably easier to just take a snapshot of the root volume and attach it to an instance that does not have the problem.
Comment 8 Jon Brightwell 2018-12-04 15:43:09 UTC
Created attachment 791761 [details]
cloud-init.log

cloud init log added.
Comment 9 Jon Brightwell 2018-12-04 15:48:42 UTC
Other log looks like 



Cloud-init v. 18.4 running 'init-local' at Tue, 04 Dec 2018 15:27:28 +0000. Up 10.75 seconds.
Cloud-init v. 18.4 running 'init' at Tue, 04 Dec 2018 15:27:35 +0000. Up 17.83 seconds.
ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: |  eth0  | False |     .     |     .     |   .   | fa:16:3e:55:48:2c |
ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: +-------+-------------+---------+-----------+-------+


On the working version, that eth0 is up and populated with an address.
Comment 10 Jon Brightwell 2018-12-04 15:52:41 UTC
Interestingly the "working" version I use, has an error in the output

Cloud-init v. 18.4 running 'init-local' at Tue, 04 Dec 2018 15:49:12 +0000. Up 10.21 seconds.
2018-12-04 15:49:12,665 - stages.py[ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan']
Cloud-init v. 18.4 running 'init' at Tue, 04 Dec 2018 15:49:29 +0000. Up 27.38 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
ci-info: | Device |  Up  |           Address            |       Mask      | Scope  |     Hw-Address    |
ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
ci-info: |  eth0  | True |         51.68.84.55          | 255.255.255.255 | global | fa:16:3e:88:b0:a4 |
ci-info: |  eth0  | True | fe80::f816:3eff:fe88:b0a4/64 |        .        |  link  | fa:16:3e:88:b0:a4 |
ci-info: |   lo   | True |          127.0.0.1           |    255.0.0.0    |  host  |         .         |
ci-info: |   lo   | True |           ::1/128            |        .        |  host  |         .         |
ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
Comment 11 Jon Brightwell 2018-12-04 15:55:14 UTC
Created attachment 791765 [details]
cloud-init.log from a "working" server

This is from a VM that shows the network renderer error but otherwise works.
Comment 12 Jon Brightwell 2019-01-16 15:02:32 UTC
Just checked the latest stock image https://build.opensuse.org/package/binaries/openSUSE:Leap:15.0:Images/kiwi-templates-Leap15-JeOS:OpenStack-Cloud/images and still seeing the same issue.
Comment 13 Robert Schweikert 2019-01-16 22:58:51 UTC
Yep, this is https://bugs.launchpad.net/cloud-init/+bug/1812117

bottom line is that the route information gets lost, well it gets written to ifcfg-eth0 as that is the way things work for RH distributions. AT the point where the information gets "lost" there is no distro specific information available, i.e. there is no simple "if opensuse or sles: write_a_route_file()" option.

I will ponder this some more but most likely will depend on a fix from upstream.

There's also a SLES bug #1121878. However, I will not mark this as duplicate to avoid cutting out the information in favor of a non visible bug.

If your gateway IP is the same for all instances you can built a route file into your image as a work around:

/etc/sysconfig/network/routes
default 51.68.80.1

Or you can inject the file with user data.

Sorry this is still fallout from switching over to the "new" network config code in cloud-init
Comment 14 Robert Schweikert 2019-01-17 00:20:33 UTC
OK, I had an idea, I think this will work. Can you please test cloud-init-18.5 from Cloud:Tools:Next.
Comment 15 Jon Brightwell 2019-01-17 09:56:29 UTC
18.5 with your route patch does appear to work for eth0. I'll upload the logs as there was a couple of things that still stood out. I'm still seeing this early on

2019-01-17 09:46:12,497 - dhcp.py[DEBUG]: Skip dhclient configuration: No dhclient command found.
2019-01-17 09:46:12,497 - util.py[DEBUG]: 
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 131, in _get_data
    with EphemeralDHCPv4(self.fallback_interface):
  File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 50, in __enter__
    raise NoDHCPLeaseError()
cloudinit.net.dhcp.NoDHCPLeaseError


and in cloud-init-output

2019-01-17 09:46:29,691 - stages.py[ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan']


I added a second network to openstack but it doesn't appear to pick that up. wicked ifup eth1 works fine though (with a manual /etc/sysconfig entry).
Comment 16 Jon Brightwell 2019-01-17 09:57:20 UTC
Created attachment 794660 [details]
cloud init output 18.5 with patch
Comment 17 Jon Brightwell 2019-01-17 09:58:02 UTC
Created attachment 794661 [details]
cloudinit.log 18.5 with patch
Comment 18 Jon Brightwell 2019-01-17 10:33:54 UTC
From the obj.pkl, looks like eth1 should be up with dhcp.



'networks': [{'network_id': '581fad02-158d-4dc6-81f0-c1ec2794bbec', 'type': 'ipv4', 'netmask': '255.255.240.0', 'link': 'tapf1bc5e01-9c', 'routes': [{'netmask': '0.0.0.0', 'network': '0.0.0.0', 'gateway': '51.68.80.1'}], 'ip_address': '51.68.80.44', 'id': 'network0'}, {'network_id': 'a2c3f116-6a36-4869-8168-f5b55d36aa19', 'type': 'ipv4_dhcp', 'link': 'tapd2a9816e-df', 'id': 'network1'}], 'links': [{'type': 'bgpovs', 'vif_id': 'f1bc5e01-9c92-420f-914d-e2753c936451', 'ethernet_mac_address': 'fa:16:3e:d5:5f:e2', 'id': 'tapf1bc5e01-9c', 'mtu': 1500}, {'type': 'ovs', 'vif_id': 'd2a9816e-dfcf-4153-b4b3-e3f36e351bdd', 'ethernet_mac_address': 'fa:16:3e:f3:db:ba', 'id': 'tapd2a9816e-df', 'mtu': 9000}]}, '_dirty_cache': True, 'dsmode': 'net', 'vendordata_pure': {'cloud-init': '#cloud-config\nmanage_etc_hosts: localhost'}, '_cloud_name': 'openstack', '_network_config': {'version': 1, 'config': [{'type': 'physical', 'mtu': 1500, 'subnets': [{'type': 'static', 'netmask': '255.255.240.0', 'routes': [{'netmask': '0.0.0.0', 'network': '0.0.0.0', 'gateway': '51.68.80.1'}], 'address': '51.68.80.44', 'ipv4': True}], 'mac_address': 'fa:16:3e:d5:5f:e2', 'name': 'eth0'}, {'type': 'physical', 'mtu': 9000, 'subnets': [{'type': 'dhcp4'}], 'mac_address': 'fa:16:3e:f3:db:ba', 'name': 'eth1'}, {'type': 'nameserver', 'address': '213.186.33.99'}]}}

New bug report needed?
Comment 19 Robert Schweikert 2019-01-17 12:52:50 UTC
(In reply to Jon Brightwell from comment #15)
> 18.5 with your route patch does appear to work for eth0. I'll upload the
> logs as there was a couple of things that still stood out. I'm still seeing
> this early on
> 
> 2019-01-17 09:46:12,497 - dhcp.py[DEBUG]: Skip dhclient configuration: No
> dhclient command found.

OK, that is weird. I added dhclient as a dependency to the package and it should get pulled in when you install cloud-init.

"""
Requires:       dhcp-client
"""

That dhclient is not there would indicate a "force install" ignoring the dependency.

> 2019-01-17 09:46:12,497 - util.py[DEBUG]: 
> Traceback (most recent call last):
>   File
> "/usr/lib/python3.6/site-packages/cloudinit/sources/DataSourceOpenStack.py",
> line 131, in _get_data
>     with EphemeralDHCPv4(self.fallback_interface):
>   File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 50, in
> __enter__
>     raise NoDHCPLeaseError()
> cloudinit.net.dhcp.NoDHCPLeaseError
> 
> 
> and in cloud-init-output
> 
> 2019-01-17 09:46:29,691 - stages.py[ERROR]: Unable to render networking.
> Network config is likely broken: No available network renderers found.
> Searched through list: ['eni', 'sysconfig', 'netplan']

This is probably the fallout from not finding the dhclient

I am surprised you get any network connectivity without dhclient installed.
Comment 20 Robert Schweikert 2019-01-17 13:00:27 UTC
After reviewing the log I feel pretty confident that the errors you are see is the cascading effect from the missing dhclient.

Can you install dhcp-client in the image and try again?

Thanks
Comment 21 Jon Brightwell 2019-01-17 14:40:30 UTC
It does clear the cloud-init.log dhclient error. The output error is still there and eth1 is still manual.


2019-01-17 14:33:28,750 - stages.py[ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan']
Cloud-init v. 18.4 running 'init' at Thu, 17 Jan 2019 14:33:44 +0000. Up 30.65 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
ci-info: +--------+-------+-----------------------------+-----------------+--------+-------------------+
ci-info: | Device |   Up  |           Address           |       Mask      | Scope  |     Hw-Address    |
ci-info: +--------+-------+-----------------------------+-----------------+--------+-------------------+
ci-info: |  eth0  |  True |         51.68.92.59         | 255.255.255.255 | global | fa:16:3e:7b:01:34 |
ci-info: |  eth0  |  True | fe80::f816:3eff:fe7b:134/64 |        .        |  link  | fa:16:3e:7b:01:34 |
ci-info: |  eth1  | False |              .              |        .        |   .    | fa:16:3e:2c:e7:43 |
ci-info: |   lo   |  True |          127.0.0.1          |    255.0.0.0    |  host  |         .         |
ci-info: |   lo   |  True |           ::1/128           |        .        |  host  |         .         |
ci-info: +--------+-------+-----------------------------+-----------------+--------+-------------------+


Not sure why dhcp-client wasn't included in the image, it wasn't blocked in anyway. I explicitly added it to the package list in kiwi and it grabbed it.
Comment 22 Jon Brightwell 2019-01-17 14:41:10 UTC
Created attachment 794749 [details]
cloud-init.log v18.5 with dhcp-client
Comment 23 Robert Schweikert 2019-01-17 22:16:32 UTC
Somtheing is amiss in comment#21 it shows:

"""
Cloud-init v. 18.4 running 'init' at Thu, 17 Jan 2019 14:33:44 +0000. Up 30.65 
"""

that should be 18.5
Comment 24 Robert Schweikert 2019-01-17 22:39:49 UTC
The good news is that with dhclient in place we are now picking up the network configuration from OpenStack:

2019-01-17 14:33:28,725 - DataSourceOpenStack.py[DEBUG]: network config provided via network_json

2019-01-17 14:33:28,735 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'type': 'physical', 'mtu': 1500, 'subnets': [{'type': 'static', 'netmask': '255.255.240.0', 'routes': [{'netmask': '0.0.0.0', 'network': '0.0.0.0', 'gateway': '51.68.80.1'}], 'address': '51.68.92.59', 'ipv4': True}], 'mac_address': 'fa:16:3e:7b:01:34', 'name': 'eth0'}, {'type': 'physical', 'mtu': 9000, 'subnets': [{'type': 'dhcp4'}], 'mac_address': 'fa:16:3e:2c:e7:43', 'name': 'eth1'}, {'type': 'nameserver', 'address': '213.186.33.99'}]}

The system should find the sysconfig renderer. Since this is not found it implies that the image does not have "ifup" and "ifdown" in the expected places, that would be '/sbin', '/usr/sbin'

ifup/ifdown are supplied by wicked-service

I think there is an issue with your image.
Comment 25 Jon Brightwell 2019-01-17 23:21:26 UTC
I'm swearing vehemently at my screen as I just realised I wasted a bunch of your time and mine. Sorry. It was still using my broken but working patched version of 18.4 as I typo'd cloud:tools:next's repo.

    
Kiwi doesn't give an error if the repo is wrong.
    <repository type="rpm-md" priority="94" imageinclude="true">
        <source path='obs://Cloud:Tools:Next/openSUSE_Leap_15.0'/>
    </repository>
  	<repository type="rpm-md" priority="95" imageinclude="true">
        <source path='obs://home:zippy:jx:packages/openSUSE_Leap_15.0'/>
    </repository>
    <repository type="rpm-md" priority="97" >
        <source path='obs://openSUSE:Leap:15.0:Update/standard'/>
    </repository>
    <repository type="rpm-md" priority="98" >
        <source path='obs://openSUSE:Leap:15.0/standard'/>
    </repository>

Working version below - spot the difference? cloud:tools is with the .0 just to be difficult!

    <repository type="rpm-md" priority="94" imageinclude="true">
        <source path='obs://Cloud:Tools:Next/openSUSE_Leap_15'/>
    </repository>
  	<repository type="rpm-md" priority="95" imageinclude="true">
        <source path='obs://home:zippy:jx:packages/openSUSE_Leap_15.0'/>
    </repository>
    <repository type="rpm-md" priority="97" >
        <source path='obs://openSUSE:Leap:15.0:Update/standard'/>
    </repository>
    <repository type="rpm-md" priority="98" >
        <source path='obs://openSUSE:Leap:15.0/standard'/>
    </repository>

I got out of bed to rerun the tests. Just waiting on OBS/kiwi.
Comment 26 Jon Brightwell 2019-01-17 23:36:10 UTC
ok, confirmed 18.5 and dhcp-client.

cloud output shows, with no net renderer errors.
ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: |  eth0  | False |     .     |     .     |   .   | fa:16:3e:04:91:c7 |
ci-info: |  eth1  | False |     .     |     .     |   .   | fa:16:3e:b5:44:f8 |
ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: +-------+-------------+---------+-----------+-------+


but after logging in it shows both NICs up and with addresses. /etc/sysconfig/network is also nicely populated.

test18:/home/opensuse # ll /etc/sysconfig/network
total 68
-rw-r--r-- 1 root root  9692 Jan 17 23:17 config
-rw-r--r-- 1 root root 13519 Jan 17 23:17 dhcp
drwxr-xr-x 2 root root     6 Jun  7  2018 if-down.d
drwxr-xr-x 2 root root     6 Jun  7  2018 if-up.d
-rw-r--r-- 1 root root   275 Jan 17 23:27 ifcfg-eth0
-rw-r--r-- 1 root root   200 Jan 17 23:27 ifcfg-eth1
-rw------- 1 root root   147 Dec 12 19:17 ifcfg-lo
-rw-r--r-- 1 root root 21738 Jun 11  2018 ifcfg.template
-rw-r--r-- 1 root root    19 Jan 17 23:27 ifroute-eth0
drwx------ 2 root root     6 Jun  7  2018 providers
drwxr-xr-x 2 root root    60 Jan 17 23:17 scripts
test18:/home/opensuse # cat /etc/sysconfig/network/ifcfg-eth1
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth1
HWADDR=fa:16:3e:b5:44:f8
MTU=9000
NM_CONTROLLED=no
ONBOOT=yes
STARTMODE=auto
TYPE=Ethernet
USERCTL=no
test18:/home/opensuse # cat /etc/sysconfig/network/ifroute-eth0
default 51.68.80.1


I'll upload the cloud-init.log in case you can spot something I didn't (again), but I honestly thing this is now good to go! Thank you for your help.
Comment 27 Jon Brightwell 2019-01-17 23:37:30 UTC
Created attachment 794797 [details]
cloud-init.log v18.5 with dhcp-client from proper repo
Comment 28 Robert Schweikert 2019-01-18 09:54:20 UTC
Thanks for testing. On it's way to Factory and will eventually trickle into openSUSE Leap.

created request id 666934
Comment 29 Jon Brightwell 2019-01-18 09:56:06 UTC
Still found an intermittent issue with the network not coming up right. I've fired up 5 instances this morning to expand my test case and found 3 failures. I reran the most basic test (2 networks, no routing apart from default gw) and had 2/3 successes.

Just gathering evidence.
Comment 30 Jon Brightwell 2019-01-18 10:35:11 UTC
Looks like with have a race. 3/10 cloned instances passed an ext ping test. I managed to get on the others via a jumpbox to the priv net. The issue is the routes are wrong.


Non-working box.

default via 10.150.0.1 dev eth1 proto dhcp 
10.150.0.0/16 dev eth1 proto kernel scope link src 10.150.1.59 
51.68.80.0/20 dev eth0 proto kernel scope link src 51.68.89.122 
169.254.169.254 via 10.150.1.1 dev eth1 proto dhcp 

# eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=static
DEFROUTE=yes
DEVICE=eth0
GATEWAY=51.68.80.1
HWADDR=fa:16:3e:25:b4:59
IPADDR=51.68.89.122
MTU=1500
NETMASK=255.255.240.0
NM_CONTROLLED=no
ONBOOT=yes
STARTMODE=auto
TYPE=Ethernet
USERCTL=no


# eth1
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth1
HWADDR=fa:16:3e:b1:ca:29
MTU=9000
NM_CONTROLLED=no
ONBOOT=yes
STARTMODE=auto
TYPE=Ethernet
USERCTL=no

#ifroute-eth0 
default 51.68.80.1



nothing unusual in cloud output

Cloud-init v. 18.5 running 'init-local' at Fri, 18 Jan 2019 10:14:52 +0000. Up 8.89 seconds.
Cloud-init v. 18.5 running 'init' at Fri, 18 Jan 2019 10:14:59 +0000. Up 15.66 seconds.
ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: |  eth0  | False |     .     |     .     |   .   | fa:16:3e:25:b4:59 |
ci-info: |  eth1  | False |     .     |     .     |   .   | fa:16:3e:b1:ca:29 |
ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: +-------+-------------+---------+-----------+-------+


cloud-init.log to follow.
Comment 31 Jon Brightwell 2019-01-18 10:36:05 UTC
Created attachment 794847 [details]
cloud-init.log 18.5  route race
Comment 32 Swamp Workflow Management 2019-01-18 10:40:07 UTC
This is an autogenerated message for OBS integration:
This bug (1116767) was mentioned in
https://build.opensuse.org/request/show/666936 Factory / cloud-init
Comment 33 Jon Brightwell 2019-01-18 12:40:07 UTC
As a side note, I spotted this behaviour in a different way (not cloudinit based) but when bring up eth1 manually in the vm. I ended up doing

DHCLIENT_SET_DEFAULT_ROUTE="no"

in ifcfg-eth1


Maybe cloud-init could detect the default gateway before setting up the networks and add that to the other configs?
Comment 34 Robert Schweikert 2019-01-18 14:03:52 UTC
Can yo please attach the generated ifroute-* files.

cloud-init can only work with the data that is provided, i.e. with the data in the "network:" section of the config. The code cannot make any assumptions about the state of the system or poke around and try to find "this may be a working network configuration"

If 

DHCLIENT_SET_DEFAULT_ROUTE="no"

works as a general rule, I'll have to look into this a bit. Then I can certainly add this to all "if name != eth0". Of course that would completely fail if someone uses predictable interface names as in that case we wouldn't know which is equivalent to eth0
Comment 35 Jon Brightwell 2019-01-18 14:27:07 UTC
Only ifroute* is

> cat ifroute-eth0 
default 51.68.80.1



We'd definitely have to detect the default gateway and only set DHCLIENT_SET_DEFAULT_ROUTE="no" on the remaining. It certainly wouldn't make sense to presume eth0 is it. It looks like you can pull it from the config by checking for the 0.0.0.0 route:

{'version': 1, 'config': [{'type': 'physical', 'mtu': 1500, 'subnets': [{'type': 'static', 'netmask': '255.255.240.0', 'routes': [{'netmask': '0.0.0.0', 'network': '0.0.0.0', 'gateway': '51.68.80.1'}], 'address': '51.68.89.122', 'ipv4': True}], 'mac_address': 'fa:16:3e:25:b4:59', 'name': 'eth0'}, {'type': 'physical', 'mtu': 9000, 'subnets': [{'type': 'dhcp4'}], 'mac_address': 'fa:16:3e:b1:ca:29', 'name': 'eth1'}, {'type': 'nameserver', 'address': '213.186.33.99'}]}
Comment 36 Jon Brightwell 2019-01-18 14:35:00 UTC
or the reverse logic, if it has no 0.0.0.0, then add the var to ifcfg-x.
Comment 37 Jon Brightwell 2019-01-18 14:55:59 UTC
It can also go in /etc/sysconfig/network/dhcp as we know that openstack provides all routes including the default one.

## Type:        yesno
## Default:     yes
#
# Should the DHCP client set a default route (default Gateway) (yes|no)
#
# When multiple copies of dhcp client run, it would make sense that only
# one of them does it. 
#
DHCLIENT_SET_DEFAULT_ROUTE="yes"


Would that be a cloud-init issue or require the admin to know to set this though?
Comment 38 Robert Schweikert 2019-01-18 17:12:07 UTC
(In reply to Jon Brightwell from comment #36)
> or the reverse logic, if it has no 0.0.0.0, then add the var to ifcfg-x.

Well that kind of implies that the route processing, i.e. creation of ifroute-* and interface configuration, i.e. ifcfg-* creation happen at the same time.

I'll have to check if we have the right data to make such determination in the right place.

cloud-init does have options to set configurations in non interface specific files, specifically /etc/sysconfig/network, which is a file on RH distros. This kind of general configuration setting is not supported on SUSE distros at this time. Also not something I could add in a patch during a package update. That would be something I have to discuss with upstream and we'll have to come to an agreement on implementation to get it accepted.

So to get the immediate concern resolved I will figure out if I have sufficient data to write DHCLIENT_SET_DEFAULT_ROUTE="no" to ifcfg-* if there is no routing information in the network config. This brings with it the risk that if someone does want the default route from a dhcp server then they cannot have it. But we'll let those that have that problem file a bug and then go from there.

I don't need any more info ATM, thus clearing the flag. Thanks for the info provided so far.
Comment 39 Robert Schweikert 2019-01-18 19:07:22 UTC
Checking for the presence of "0.0.0.0" as a route is flawed, for example in EC2 we have:

{'version': 1, 'config': [{'type': 'physical', 'name': 'eth0', 'subnets': [{'type': 'dhcp4'}], 'mac_address': '0e:45:3c:2d:80:6c'}]}

no route is defined, that would meet the check for "0.0.0.0" criteria but we do want the dhcp client to set the default route.

Needs more thinking.
Comment 40 Jon Brightwell 2019-01-18 19:11:06 UTC
But does aws advertise itself as openstack?

The top of the pickled config shows.. {'datasource_list': ['OpenStack', 'None']
Comment 41 Robert Schweikert 2019-01-19 22:35:15 UTC
OK that was an interesting deep dive into the network config implementation of cloud-init. Not all bits and pieces are in place, but the current package should work for you.

Please test the package from the Cloud:Tools project, since I had move 18.5 already I figured I might as well fix things there.

Thanks
Comment 42 Jon Brightwell 2019-01-21 09:45:10 UTC
Did a mass test of 9 instances with 3 NICs (1 cloud-init setup, 2 DHCP with one of them with custom routes). They all passed. Only one ifroute for eth0, the rest from dhcp loaded dynamically as expected.

Let me know if there's any other tests you want me to run.
Comment 43 Jon Brightwell 2019-01-21 10:54:30 UTC
Found a broken case. If the default gateway isn't on the first NIC, it seems to not bring anything up. I'm just gathering logs etc.
Comment 44 Robert Schweikert 2019-01-21 12:43:56 UTC
Well the idea of "the first NIC" is broken as soon as net.ifnames = 1, i.e. once predictable NIC names are used.

What happens if you copy the ifroute-* file to /etc/sysconfig/network/routes for the broken case? Not I have any idea how I would implement that ATM but at least I'd know the behavior.
Comment 45 Robert Schweikert 2019-01-21 13:31:54 UTC
Comment#42 probably describes the most common use case. Since that is fixed now I am going to push the current code into Factory and trickle it into Leap via maintenance releases in SLES.

We can continue to investigate and work on the "corner case" described in comment#43. Can you please file a new bug for that?
Comment 46 Robert Schweikert 2019-01-21 14:10:55 UTC
On the way to Factory: created request id 667611
Comment 47 Jon Brightwell 2019-01-21 14:13:22 UTC
Rgr, Thanks for your help again!
Comment 48 Robert Schweikert 2019-01-21 14:20:02 UTC
Issue from comment#43 will be handled in a separate bug
Comment 50 Swamp Workflow Management 2019-01-21 14:50:07 UTC
This is an autogenerated message for OBS integration:
This bug (1116767) was mentioned in
https://build.opensuse.org/request/show/667611 Factory / cloud-init
Comment 52 Swamp Workflow Management 2019-01-23 14:20:08 UTC
This is an autogenerated message for OBS integration:
This bug (1116767) was mentioned in
https://build.opensuse.org/request/show/668094 Factory / cloud-init
Comment 54 Swamp Workflow Management 2019-01-24 10:00:08 UTC
This is an autogenerated message for OBS integration:
This bug (1116767) was mentioned in
https://build.opensuse.org/request/show/668094 Factory / cloud-init
Comment 60 Swamp Workflow Management 2019-06-18 13:11:13 UTC
SUSE-RU-2019:1542-1: An update that has 8 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1116767,1119397,1121878,1123694,1125950,1125992,1126101,1132692
CVE References: 
Sources used:
SUSE Linux Enterprise Module for Public Cloud 15 (src):    cloud-init-18.5-5.8.1
SUSE Linux Enterprise Module for Open Buildservice Development Tools 15 (src):    cloud-init-18.5-5.8.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 61 Swamp Workflow Management 2019-06-24 16:12:03 UTC
openSUSE-RU-2019:1613-1: An update that has 8 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1116767,1119397,1121878,1123694,1125950,1125992,1126101,1132692
CVE References: 
Sources used:
openSUSE Leap 15.0 (src):    cloud-init-18.5-lp150.2.13.1
Comment 62 Swamp Workflow Management 2019-06-27 13:14:41 UTC
SUSE-RU-2019:1715-1: An update that has 15 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1087331,1095627,1097388,1099340,1101894,1111427,1114160,1116767,1119397,1121878,1123694,1125950,1125992,1126101,1132692
CVE References: 
Sources used:
SUSE Linux Enterprise Software Development Kit 12-SP4 (src):    dhcp-4.3.3-10.16.4
SUSE Linux Enterprise Software Development Kit 12-SP3 (src):    dhcp-4.3.3-10.16.4
SUSE Linux Enterprise Server 12-SP4 (src):    dhcp-4.3.3-10.16.4
SUSE Linux Enterprise Server 12-SP3 (src):    dhcp-4.3.3-10.16.4
SUSE Linux Enterprise Module for Public Cloud 12 (src):    cloud-init-18.5-37.21.1
SUSE Linux Enterprise Desktop 12-SP4 (src):    dhcp-4.3.3-10.16.4
SUSE Linux Enterprise Desktop 12-SP3 (src):    dhcp-4.3.3-10.16.4
SUSE CaaS Platform 3.0 (src):    cloud-init-18.5-37.21.1, dhcp-4.3.3-10.16.4
OpenStack Cloud Magnum Orchestration 7 (src):    cloud-init-18.5-37.21.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 63 Swamp Workflow Management 2019-07-01 10:14:25 UTC
openSUSE-RU-2019:1681-1: An update that has 15 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1087331,1095627,1097388,1099340,1101894,1111427,1114160,1116767,1119397,1121878,1123694,1125950,1125992,1126101,1132692
CVE References: 
Sources used:
openSUSE Leap 42.3 (src):    cloud-init-18.5-40.1, dhcp-4.3.3-11.9.1
Comment 65 Carsten Hoeger 2019-07-11 13:23:54 UTC
I seem to have this problem with this image http://download.opensuse.org/repositories/Cloud:/Images:/Leap_15.1/images/openSUSE-Leap-15.1-OpenStack.x86_64-0.0.4-Build6.2.qcow2

Cloud-init v. 19.1 running 'init-local' at Thu, 11 Jul 2019 12:17:36 +0000. Up 8.50 seconds.
2019-07-11 12:17:42,767 - stages.py[ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan']

Does it mean it isn't fixed there or is that a different cause?
Comment 66 Swamp Workflow Management 2019-07-29 16:16:24 UTC
SUSE-RU-2019:2005-1: An update that has 9 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1116767,1119397,1121878,1123694,1125950,1125992,1126101,1132692,1136440
CVE References: 
Sources used:
SUSE Linux Enterprise Module for Public Cloud 15-SP1 (src):    cloud-init-19.1-8.3.1
SUSE Linux Enterprise Module for Open Buildservice Development Tools 15-SP1 (src):    cloud-init-19.1-8.3.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 68 Robert Schweikert 2019-08-21 19:47:35 UTC
(In reply to Carsten Hoeger from comment #65)
> I seem to have this problem with this image
> http://download.opensuse.org/repositories/Cloud:/Images:/Leap_15.1/images/
> openSUSE-Leap-15.1-OpenStack.x86_64-0.0.4-Build6.2.qcow2
> 
> Cloud-init v. 19.1 running 'init-local' at Thu, 11 Jul 2019 12:17:36 +0000.
> Up 8.50 seconds.
> 2019-07-11 12:17:42,767 - stages.py[ERROR]: Unable to render networking.
> Network config is likely broken: No available network renderers found.
> Searched through list: ['eni', 'sysconfig', 'netplan']
> 
> Does it mean it isn't fixed there or is that a different cause?

Looks like the cloud.cfg file in that image does not set the correct distribution. The renderer to be used is determined based on a hard coded list of distros. "sysconfig" will be found if thedistribution in cloud.cfg is one of

'opensuse', 'sles', 'suse', 'redhat', 'fedora', 'centos'