Bug 150787 - NetworkManager: driver gets stuck and cannot cancel activation
Summary: NetworkManager: driver gets stuck and cannot cancel activation
Status: RESOLVED FIXED
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Network (show other bugs)
Version: Beta 3
Hardware: i686 SuSE Linux 10.1
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Robert Love
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 144305
  Show dependency treegraph
 
Reported: 2006-02-14 13:49 UTC by Forgotten User ZhJd0F0L3x
Modified: 2006-03-17 11:56 UTC (History)
1 user (show)

See Also:
Found By: Component Test
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User ZhJd0F0L3x 2006-02-14 13:49:34 UTC
While trying to connect to a wireless network (see  bug #150784), if i then select another network to connect to, wired or wireless, NM hangs for a long time in a 
NetworkManager: <information>   Activation (air): waiting for device to cancel activation.
NetworkManager: <information>   Activation (air): waiting for device to cancel activation.

loop. If i restart NetworkManager at this time, a wpa_supplicant gets left behind and renders all further wireless attempts useless.

This means that wireless just no longer works until reboot => blocker.
Comment 1 Robert Love 2006-02-14 16:25:40 UTC
What driver?

What does the NM log show after you restart it?

NM should kill wpa_supplicant when it restarts.

What happens if you `killall wpa_supplicant` ?

What about `killall -9 wpa_supplicant` ?

Do they die or is it that wpa_supplicant is stuck?
Comment 2 Forgotten User ZhJd0F0L3x 2006-02-14 17:25:15 UTC
(In reply to comment #1)
> What driver?

ipw2200

> What does the NM log show after you restart it?

nothing special. And it spawns a new wpa_supplicant if it needs one.
Unfortunately they interact badly :-)

> NM should kill wpa_supplicant when it restarts.

it doesn't, at least not if it is not restarted "gracefully"

> What happens if you `killall wpa_supplicant` ?

they go away

> What about `killall -9 wpa_supplicant` ?

not needed
 
> Do they die or is it that wpa_supplicant is stuck?

no. It looks like just nobody tells them to get lost.

This is easily reproducible:
- start nm, cable connected, WLAN switch off
- "connect to other wireless network", enter something with WPA encryption.
- while nm is trying to connect, switch back to wired.
- "Activation (air): waiting for device to cancel activation."
- now immediately do a "rcnetwork restart"
- nm does not want to die, but after some time is kill-9ed. new nm starts
- the old wpa_supplicant is still hanging around.
Comment 3 Robert Love 2006-02-14 20:35:50 UTC
If you `kill -9` the thing, of course it cannot clean up the rogue wpa_supplicant...

Additionally, the "waiting for device to cancel ..." is caused by a stuck driver.

Talked this over with JP, and marking severity down to normal. 
Comment 4 Forgotten User ZhJd0F0L3x 2006-02-14 20:56:40 UTC
(In reply to comment #3)
> If you `kill -9` the thing, of course it cannot clean up the rogue
> wpa_supplicant...

of course, but the problem is that i (or better the initscript) _have_ to kill -9 it because it does not exit within 5 seconds

> Additionally, the "waiting for device to cancel ..." is caused by a stuck
> driver.

I am quite sure that i have seen this even with cable interfaces, so i am not sure this is true. Also, starting NM immediately afterwards makes the interface work fine, so i do think that the state machine in NM (if there is one) is confused.
Comment 5 Robert Love 2006-03-06 16:34:50 UTC
The activation and activation cancel state machine has been reworked to hopefully improve activation canceling issues with bad drivers.

Can you please test 0.6.0cvs20060306 (just submitted to autobuild), or later, please?  I also built packages:

http://primates.ximian.com/~rml/misc/NetworkManager/

When canceling, you need to wait the full timeout, which is about 90s.  So have patience.  Do not kill the daemon.  So your steps in comment #2, but do not kill the daemon.  Just wait and see if it cancels.

Let me know.  Danke!
Comment 6 Forgotten User ZhJd0F0L3x 2006-03-17 11:56:21 UTC
i switched around networks and it canceled just fine:

Mar 17 11:58:10 susi NetworkManager: <information>      air: Device is fully-supported using driver 'ipw2200'.
Mar 17 11:58:10 susi NetworkManager: <information>      nm_device_init(): waiting for device's worker thread to start
Mar 17 11:58:10 susi NetworkManager: <information>      nm_device_init(): device's worker thread started, continuing.
Mar 17 11:58:10 susi NetworkManager: <information>      Now managing wireless (802.11) device 'air'.
Mar 17 11:58:10 susi NetworkManager: <information>      Deactivating device air.
Mar 17 11:58:10 susi NetworkManager: <information>      Stopping ypbind.
Mar 17 11:58:10 susi NetworkManager: <information>      Restarting autofs.

Mar 17 11:58:13 susi NetworkManager: <information>      cable: Device is fully-supported using driver 'e1000'.
Mar 17 11:58:13 susi NetworkManager: <information>      nm_device_init(): waiting for device's worker thread to start
Mar 17 11:58:13 susi NetworkManager: <information>      nm_device_init(): device's worker thread started, continuing.
Mar 17 11:58:13 susi NetworkManager: <information>      Now managing wired Ethernet (802.3) device 'cable'.
Mar 17 11:58:13 susi NetworkManager: <information>      Deactivating device cable.
Mar 17 11:58:13 susi NetworkManager: <information>      Stopping ypbind.
Mar 17 11:58:13 susi NetworkManager: <information>      Restarting autofs.
Mar 17 11:58:13 susi NetworkManager: <information>      Will activate wired connection 'cable' because it now has a link.
Mar 17 11:58:13 susi NetworkManager: <information>      SWITCH: no current connection, found better connection 'cable'.
Mar 17 11:58:13 susi NetworkManager: <information>      Will activate connection 'cable'.
Mar 17 11:58:13 susi NetworkManager: <information>      Device cable activation scheduled...
Mar 17 11:58:13 susi NetworkManager: <information>      Activation (cable) started...
Mar 17 11:58:13 susi NetworkManager: <information>      Activation (cable) Stage 1 of 5 (Device Prepare) scheduled...
Mar 17 11:58:13 susi NetworkManager: <information>      Activation (cable) Stage 1 of 5 (Device Prepare) started...
Mar 17 11:58:13 susi NetworkManager: <information>      Activation (cable) Stage 2 of 5 (Device Configure) scheduled...

I'd say this is fixed, if i run into problems again, i'll reopen.