Bug 745859

Summary: network reports failure at start under systemd
Product: [openSUSE] openSUSE 12.1 Reporter: Marius Tomaschewski <mt>
Component: BasesystemAssignee: Marius Tomaschewski <mt>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: fcrozat, koenig, mt
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on: 744293    
Bug Blocks:    
Attachments: network debug log

Description Marius Tomaschewski 2012-02-08 15:17:40 UTC
+++ This bug was initially created as a clone of Bug #744293 +++

New bug nr for https://bugzilla.novell.com/show_bug.cgi?id=744293#c22:
======================================================================
Feb  7 17:20:42 t3 kernel: [   22.656993] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Feb  7 17:20:42 t3 kernel: [   22.657887] ADDRCONF(NETDEV_CHANGE): eth0: link
becomes ready
Feb  7 17:20:42 t3 ifup-dhcp:     eth0      Starting DHCP4 client
Feb  7 17:20:42 t3 dhcpcd[1900]: eth0: dhcpcd 3.2.3 starting
...
Feb  7 17:21:07 t3 network[866]: eth0      device: Intel Corporation 82573E
Gigabit Ethernet Con
Feb  7 17:21:07 t3 network[866]: eth0      . . . is just beeing set up
Feb  7 17:21:07 t3 network[866]: eth0      IP address: 192.168.178.47/24
Feb  7 17:21:07 t3 network[866]: eth0      is up
Feb  7 17:21:07 t3 network[866]: [1Awaiting
Feb  7 17:21:07 t3 network[866]: eth0      interface could not be set up until
now
Feb  7 17:21:07 t3 network[866]: ..failedSetting up service (localfs) network 
.  .  .  .  .  .  .  .  .  ...failed
Feb  7 17:21:07 t3 systemd[1]: network.service: control process exited,
code=exited status=7
Feb  7 17:21:07 t3 systemd[1]: Unit network.service entered failed state.
Feb  7 17:21:08 t3 rpcbind[3690]: Starting rpcbind ..done
======================================================================

Network may report failure in some cases, e.g. IPv6 duplicate address
detection failure [also for IPv4 when enabled], that is in case of a
true failure:

network[824]: eth0      IP address: 192.168.2.5/24
network[824]: eth0      IP address: 2001:aaaa:bbbb:cccc::5/64
network[824]: eth0      is up, but has tentative ipv6 address
network[824]: eth0      interface could not be set up until now
Failed to start LSB: Configure the localfs depending network interfaces [ESC[1;31mFAILED
See 'systemctl status network.service' for details.

or when it needs longer than the time limit WAIT_FOR_INTERFACES="30"
because of some other reason [e.g. bug 724777, bnx2 hardware needs
too long to detect the link, dhcp server can't be reached].

When dhcp is in use [or IPv6 dad still in progress in the kernel],
it will first report "in background" and after e.g. dhcp server is
reachable again "rcnetwork status" may report success later.


The above log looks like dhcpcd would be unable to get address
or would die(?) and didn't reported "completed" state back to ifup. 

Please set DEBUG=EXTRA in /etc/sysconfig/network/config, reboot and
attach a "tar cvzpf /tmp/network-debug.tgz /dev/.sysconfig/network".
Comment 1 Marius Tomaschewski 2012-02-16 14:09:30 UTC
Please make sure you have the most recent systemd >= 37-3.151.1 from:
http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/

install the sysconfig package (changelog from today, > 0.75.4-2.8) from:

http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/12.1:/Update:/Test/openSUSE_12.1/

------------------------------------------------------------------
Thu Feb 16 14:03:59 UTC 2012 - mt@suse.com
- Test patch: 0010-network-start-fails-under-systemd.test.patch
  bnc#734676: network bonding does not work when used with systemd
  bnc#737636: network is started then stopped right away
  bnc#745859: network reports failure at start under systemd
  Set DEBUG=EXTRA, reboot and attach tar of /dev/.sysconfig/network.
------------------------------------------------------------------

Then set DEBUG=EXTRA in /etc/sysconfig/network/config, reboot and attach
a tar.bz2 of /dev/.sysconfig/network when it occurs again.
Comment 2 Harald Koenig 2012-02-16 15:45:09 UTC
(In reply to comment #1)
> Please make sure you have the most recent systemd >= 37-3.151.1 from:
> http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/
> 
> install the sysconfig package (changelog from today, > 0.75.4-2.8) from:
> 
> http://download.opensuse.org/repositories/home:/mtomaschewski:/branches:/openSUSE:/12.1:/Update:/Test/openSUSE_12.1/

done>  rpm -q systemd sysconfig
systemd-37-3.6.1.x86_64
sysconfig-0.75.4-2.5.1.x86_64


> Then set DEBUG=EXTRA in /etc/sysconfig/network/config, reboot and attach
> a tar.bz2 of /dev/.sysconfig/network when it occurs again.

occurs always -- voila...
Comment 3 Harald Koenig 2012-02-16 15:46:22 UTC
Created attachment 476507 [details]
network debug log
Comment 4 Frederic Crozat 2012-02-16 16:09:52 UTC
you didn't upgrade systemd (>= 37-3.151.1) from :
> http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/
Comment 5 Harald Koenig 2012-02-16 17:14:17 UTC
(In reply to comment #4)
> you didn't upgrade systemd (>= 37-3.151.1) from :
> > http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/

yes I did -- but I was running the "rpm -q ..." on the wrong system. 
sorry for the confusion:-(

this should have read:

# hostname
t3
# rpm -q systemd sysconfig
systemd-37-3.152.1.x86_64
sysconfig-0.75.4-2.8.test.1.x86_64

and with time stamps:

# rpm -qa --last | head -2
sysconfig-0.75.4-2.8.test.1                   Thu Feb 16 15:54:31 2012
systemd-37-3.152.1                            Thu Feb 16 15:53:50 2012
Comment 6 Marius Tomaschewski 2012-02-21 13:28:01 UTC
OK, let's take a closer look at the debug output...

BTW: The start order is from:
  for i in exdeb.* ; do echo "`head -1 \"$i\"` $i" ; done | cut -b 12- | sort

16:01:57.004148802 exdeb.network_915.1._.start
16:01:57.770974214 exdeb.ifstatus_1098.915._.lo
16:01:58.579618724 exdeb.ifup_1168.915._.lo.-o.rc.onboot
16:01:59.107087870 exdeb.ifup_1292.915._.eth0.-o.rc.onboot
16:01:59.323404410 exdeb.ifup-dhcp_1353.1292._.eth0.eth0.-o.rc.onboot

now the dhcpcd has a lease and starts the hook:

/etc/sysconfig/network/scripts/dhcpcd-hook /var/lib/dhcpcd/dhcpcd-eth0.info new

and the "new" action calls "netconfig modify" to apply the settings
[e.g. set nameserver] and starts "ifdown eth0 -o dhcp":

16:02:06.270190635 exdeb.ifdown_2379.2249._.eth0.-o.dhcp

that executes "if-down" and "ifdown-services" scripts:

+ if-down.d/21-cifs eth0 eth0 -o dhcp
+ if-down.d/21-dhcpcd-hook-samba eth0 eth0 -o dhcp
+ if-down.d/55-samba-winbindd eth0 eth0 -o dhcp

When this is done, dhcpcd finishes its set up (e.g. sets hostname)

The exdeb.ifup_2505.2500._.eth0.-o.dhcp file has been trashed with a
"redirecting to systemctl" message....

But looking into the next one (exdeb.ifup_3265.915._.ib0.-o.rc.onboot),
shows, that dhcpcd executed the "complete" hook call (PID 2500):

/etc/sysconfig/network/scripts/dhcpcd-hook /var/lib/dhcpcd/dhcpcd-eth0.info complete

that is calling "ifup eth0 -o dhcp" (PID 2505) and that it were
started at:

16:02:07.005044115 +0100 exdeb.ifup_2505.2500._.eth0.-o.dhcp

and is executing:
   if-up.d/21-dhcpcd-hook-samba eth0 eth0 -o dhcp (PID 2600)
that waits for "systemctl force-reload nmb.service" to come back.

Note, that the "ifup eth0 -o dhcp" (PID 2505) didn't come back until:

16:02:21.286037345 exdeb.ifup_3265.915._.ib0.-o.rc.onboot
16:02:21.826189744 exdeb.ifstatus_3405.915._.eth0.-o.rc.onboot
16:02:21.962175997 exdeb.ifstatus-dhcp_3442.3405._.eth0.eth0.-o.rc.onboot
16:02:23.235222410 exdeb.ifstatus_3521.915._.eth0.-o.rc.onboot
16:02:23.376453364 exdeb.ifstatus-dhcp_3558.3521._.eth0.eth0.-o.rc.onboot
16:02:24.647099888 exdeb.ifstatus_3637.915._.eth0.-o.rc.onboot
16:02:24.788684143 exdeb.ifstatus-dhcp_3674.3637._.eth0.eth0.-o.rc.onboot
16:02:26.061014315 exdeb.ifstatus_3753.915._.eth0.-o.rc.onboot
16:02:26.203829401 exdeb.ifstatus-dhcp_3790.3753._.eth0.eth0.-o.rc.onboot
16:02:27.474755981 exdeb.ifstatus_3869.915._.eth0.-o.rc.onboot
16:02:27.619545068 exdeb.ifstatus-dhcp_3906.3869._.eth0.eth0.-o.rc.onboot
16:02:27.885805363 exdeb.ifstatus_3983.915._.eth0.-o.rc.onboot
16:02:28.021800939 exdeb.ifstatus-dhcp_4020.3983._.eth0.eth0.-o.rc.onboot
16:02:29.013067777 exdeb.network-remotefs_4150.1._.start
16:02:29.452161991 exdeb.network_4365.4150._.start.-o.remotefs
16:02:30.678635039 exdeb.ifstatus_4879.4365._.lo

The above log file shows, that the
   if-up.d/21-dhcpcd-hook-samba eth0 eth0 -o dhcp (PID 2600)
is still running and calling "systemctl force-reload winbind.service now.
Comment 7 Frederic Crozat 2012-02-21 13:33:12 UTC
please make sure samba-client from http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ is installed too (update containing this fix is in maintenance team hands).
Comment 8 Marius Tomaschewski 2012-02-21 13:45:24 UTC
Note: You don't need to _use_ any samba -- it is enough that samba-client
is installed. Please update any samba related packages from:

http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/

BTW: Workaround is to move the /etc/sysconfig/network/if-*.d/21*
     (and 55-samba-winbindd?) links away...

Resolving as duplicate of bug 724777. See also bug 732930 alias bug
732395 alias bug 725503 alias ...

*** This bug has been marked as a duplicate of bug 724777 ***
Comment 9 Harald Koenig 2012-02-21 15:19:34 UTC
(In reply to comment #8)
> Note: You don't need to _use_ any samba -- it is enough that samba-client
> is installed. Please update any samba related packages from:
> 
> http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/

interesting! 

ACK: with new samba-client the network startup is ok:

$ systemctl status network.service
network.service - LSB: Configure the localfs depending network interfaces
          Loaded: loaded (/etc/init.d/network)
          Active: active (running) since Tue, 21 Feb 2012 16:04:30 +0100; 11min ago
         Process: 1019 ExecStart=/etc/init.d/network start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/network.service
                  ? 2062 /sbin/dhclient6 -6 -cf /var/lib/dhcp6/dhclient6.eth0.conf -lf /var/lib...
                  ? 2566 /sbin/dhcpcd --netconfig -L -E -HHH -c /etc/sysconfig/network/scripts/...



now the only remaining failure is NetworkManager because I've disabled it (IMHO still not a valid and "wanted" failure -- as failures are never "wanted", see bug #745653):

NetworkManager.service    loaded failed failed        Network Manager



thanks!
Comment 10 Marius Tomaschewski 2012-02-21 16:35:02 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Note: You don't need to _use_ any samba -- it is enough that samba-client
> > is installed. Please update any samba related packages from:
> > 
> > http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/
> 
> interesting!

Yes :-)

> ACK: with new samba-client the network startup is ok:

OK, thanks for verification.


> now the only remaining failure is NetworkManager because I've disabled it
> (IMHO still not a valid and "wanted" failure -- as failures are never
> "wanted", see bug #745653):
> 
> NetworkManager.service    loaded failed failed        Network Manager

Yes, I'd say it is WONTFIX.

In LSB init scripts, you can return:

 5 (program is not installed) alias skipped,
 6 (program is not configured) alias unused,
or in a "status" query
 3 (service not running) alias unused.

this are basically also errors, but displayed differently.

Systemd does not provide such a possibility.

There are several possibilities to set a "Condition" (man systemd.unit)
for the start, but there is AFAIK no one that would allow to execute a
test to check a custom condition [ConditionExecuteTest :-)], so we were
forced to use ExecStartPre="/usr/bin/test "x${NETWORKMANAGER}" = xyes".

This works (NM is not started), but is considered as an error.

This is something we'll probably address in 12.2.