Bug 359793

Summary: Haldaemon script can cause race
Product: [openSUSE] openSUSE 11.0 Reporter: Magnus Boman <mboman>
Component: BasesystemAssignee: Danny Al-Gaaf <dalgaaf>
Status: RESOLVED FIXED QA Contact: Stephan Kulow <coolo>
Severity: Blocker    
Priority: P5 - None CC: clarkt, federico, felix, forgotten_y7f055FA1m, lavrinenko_alex
Version: Alpha 2   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Magnus Boman 2008-02-08 00:10:22 UTC
More often than not, NetworkManager fails to start with the following content in /var/log/NetworkManager;

Feb  8 11:05:33 linux NetworkManager: <info>  starting...
Feb  8 11:05:33 linux NetworkManager: <WARN>  nm_hal_manager_new(): Could not initialize connection to the HAL daemon.
Feb  8 11:05:33 linux NetworkManager: <debug> [1202468733.615195] nm_print_open_socks(): Open Sockets List:
Feb  8 11:05:33 linux NetworkManager: <debug> [1202468733.615725] nm_print_open_socks(): Open Sockets List Done.

HAL is running so looks like some sort of timing issue.

Happens in both VirtualBox and physical machine.
Comment 1 Tambet Ingo 2008-02-08 14:17:22 UTC
Is that from startup? When does the HAL get started? It's most likely not just ready to accept connections yet.
Comment 2 Clark Tompsett 2008-02-08 22:05:07 UTC
I have the same issue, 64 bit, it does not see the nic with a cable in it nor the wireless card (bcm4311 with b43 and firmware loaded).

I does work on a gateway ml3109 laptop in 32 bit mode.
Comment 3 Clark Tompsett 2008-02-09 18:12:37 UTC
In 64 bit, NetworkManager is an older version (0.7.0-15) than the one use in the 32bit version of 11.0A2 (0.7.0-17).  Pulled the source code for 0.7.0-18 and rebuilt the rpm.  Now NM does not report the hal error and it now sees the network devices.
Comment 4 Magnus Boman 2008-02-10 00:04:27 UTC
Tambet;
Yes, that's from the startup. It never recovers from that even after hal is up and running.


In reply to Comment#3; -15 and -17 are the same code. They just mean that NM has been compiled two more times (no code changes) due to one or more of it's dependencies changed.
Comment 5 Felix Möller 2008-02-10 21:03:34 UTC
I have seen this two on my 32bit setup with current factory. But it happened once in 10 boots or so. I have no idea how to reproduce it...
Comment 6 Stephan Kulow 2008-02-11 11:16:57 UTC
64bit is broken, but 32bit most likely needs to wait for HAL ;(
Comment 7 Stephan Kulow 2008-02-11 22:28:13 UTC
btw, I fixed the 64bit issue for factory, but due to hackweek it won't be synced out too quickly.
Comment 8 Federico Mena Quintero 2008-02-12 17:26:45 UTC
This doesn't work on i386 for me.  The last half-dozen lines in the log say

  Trying to start the supplicant...

Comment 9 Tambet Ingo 2008-02-12 17:46:14 UTC
NetworkManager is trying to activate wpa_supplicant dbus system service but it never appears on the bus. Reassigning to DBUS maintainer.
Comment 10 Timo Hoenig 2008-02-13 11:17:58 UTC
(In reply to comment #9 from Tambet Ingo)

> NetworkManager is trying to activate wpa_supplicant dbus system service but it
> never appears on the bus. Reassigning to DBUS maintainer.

How can you be sure?  Without logs?

Works for me using FACTORY.

If you can convince me that this is a bug in D-Bus, I'm happy to fix it. 

Comment 11 Alexander Lavrinenko 2008-02-13 12:20:06 UTC
NetworkManager seems to be broken on x86-64:

[superuser@shark :: ~]NetworkManager --no-daemon
NetworkManager: <info>  starting...
NetworkManager: <info>  Found radio killswitch /org/freedesktop/Hal/devices/ipw_wlan_switch
NetworkManager: symbol lookup error: NetworkManager: undefined symbol: nl_handle_alloc_nondefault
[superuser@shark :: ~]rpm -qa|grep Network
NetworkManager-glib-0.7.0-15
NetworkManager-0.7.0-15
NetworkManager-kde-0.7r759902-18
[superuser@shark :: ~]uname -a
Linux shark 2.6.24.1-35-default #1 SMP 2008/02/12 01:00:18 UTC x86_64 x86_64 x86_64 GNU/Linux
[superuser@shark :: ~]cat /etc/*release*
openSUSE 11.0 (X86-64) Alpha2
VERSION = 11.0



Comment 12 Stephan Kulow 2008-02-13 12:42:16 UTC
please do not mix the 64bit problem with this bug - see #6
Comment 13 Clark Tompsett 2008-02-24 02:26:27 UTC
This appears to be a timing problem.   11A2-64 is running in a separate partition.  Running rcnetwork restart starts networkmanager, as shown by ifconfig and cat /var/log/NetworkManager.  I added a line to /etc/init.d/network at line 104 
sleep 2s

Now NetworkManager starts correctly and the message shown in #1 is gone.
Comment 14 Tambet Ingo 2008-02-29 22:48:27 UTC
So what do we do with this bug? The timing bug is not a NetworkManager bug, NetworkManager requires dbus and hal to be functional when it starts. Comment #8 is something different, it indicates dbus is not able to start wpa_supplicant using system service activation for some reason (that's why I reassigned it to dbus). It might be caused by using old dbus (1.0 vs 1.1) or old wpa_supplicant that didn't have dbus support.
Comment 15 JP Rosevear 2008-03-04 16:55:59 UTC
dbus and haldaemon are actually in the start requires of network, however the haldaemon rc script uses 'startproc' to launch which can lead to races with daemons that daemonize properly, see bug 332845 for the discussion of the dbus fix.  Kay knows the background.

Danny thoughts on changing the hal script?
Comment 16 Kay Sievers 2008-03-05 13:35:52 UTC
Yes, please get entirely rid of startproc usage, like we already did for a bunch of other users. We must never detach in the background by an external tool like this, only the daemon itself can do this properly.
Comment 17 Danny Al-Gaaf 2008-03-06 12:19:17 UTC
Btw. NetworkManager should be able to handle situations if HAL isn't ready as other applications also do. If NM starts it can check if HAL is available and if not NM can wait until HAL is there. You can see this very easy by monitor DBus
Comment 18 Kay Sievers 2008-03-07 11:56:11 UTC
True, it's always good to have apps deal with that, but HAL's startup behavior caused by startproc is still a real bug that needs to be fixed. HAL has code to make sure that it does not return until it is properly initialized, but startproc daemonizes itself for no good reason, and breaks the dependency logic. We ran into exactly the same problems with udev, D-Bus, ..., so let's just get rid of startproc for all properly implemented system services.
Comment 19 Kay Sievers 2008-03-17 16:22:52 UTC
I have the same bug for a while on factory, NM is not running on bootup. It has Required-Start: ... haldaemon, but the dependency is not fulfilled, HAL isn't ready, because of using startproc.
Please remove it from the HAL init script, or I will do it if you don't have the time. Thanks!
Comment 20 Danny Al-Gaaf 2008-03-19 19:08:01 UTC
submitted package. Until uploaded use hal from:
http://download.opensuse.org/repositories/home:/dkukawka:/hal-beta/openSUSE_Factory/
Comment 21 Stephan Kulow 2008-03-26 10:45:36 UTC
*** Bug 372733 has been marked as a duplicate of this bug. ***