Bug 915160

Summary: Cannot connect to some WIFIs (eduroam) with Thinkpad T420 regresses to 13.1
Product: [openSUSE] openSUSE Distribution Reporter: Markus Zimmermann <markus.zimmermann>
Component: NetworkAssignee: wicked maintainers <wicked-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: chcao, dimstar, markus.zimmermann, mt, pwieczorkiewicz
Version: 13.2   
Target Milestone: ---   
Hardware: Other   
OS: Other   
See Also: https://bugzilla.suse.com/show_bug.cgi?id=900815
https://bugzilla.suse.com/show_bug.cgi?id=900786
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Output of the systemd journal
Logs from /var/log such as wpa_supplicant.log
TCPdump during a failed connection attempt
systemd's journal output during a failed connection attempt
wpa_supplicant log during a failed connection attempt

Description Markus Zimmermann 2015-01-28 12:25:21 UTC
I am using a freshly installed opensuse 13.2 and I cannot connect to the WIFIs of my university. Other WIFIs work. I am especially interested in the eduroam WIFI which is using WPA2 PEAP wtih MSCHAPv2. I will attach the logs which where printed out during connecting to the AP. It looks like that the DHCP request times out.

This worked with opensuse 13.1. And I am very interested that this works out of the box because it is hard to convince people of using opensuse at the university.
Comment 1 Markus Zimmermann 2015-01-28 12:25:57 UTC
Created attachment 621208 [details]
Output of the systemd journal
Comment 2 Markus Zimmermann 2015-01-28 12:27:27 UTC
Created attachment 621209 [details]
Logs from /var/log such as wpa_supplicant.log
Comment 3 Markus Zimmermann 2015-01-28 12:55:30 UTC
I forgot to state that I am using KDE4 with the networkmanager applet
Comment 4 Dominique Leuenberger 2015-01-29 10:05:23 UTC
Looking only at the output of dhcp client:

Jan 28 12:54:14 schroeder.site dhclient[3170]: DHCPDISCOVER on wlp3s0 to 255.255.255.255 port 67 interval 2 (xid=0x696de5a7)
Jan 28 12:55:02 schroeder.site dhclient[3177]: DHCPOFFER from 1.1.1.1
Jan 28 12:55:07 schroeder.site dhclient[3177]: DHCPREQUEST on wlp3s0 to 255.255.255.255 port 67 (xid=0x43f4582b)
Jan 28 12:55:12 schroeder.site dhclient[3177]: DHCPREQUEST on wlp3s0 to 255.255.255.255 port 67 (xid=0x43f4582b)
Jan 28 12:55:26 schroeder.site dhclient[3177]: DHCPDISCOVER on wlp3s0 to 255.255.255.255 port 67 interval 5 (xid=0x319147cf)
Jan 28 12:55:31 schroeder.site dhclient[3177]: DHCPDISCOVER on wlp3s0 to 255.255.255.255 port 67 interval 11 (xid=0x319147cf)
Jan 28 12:55:42 schroeder.site dhclient[3177]: DHCPDISCOVER on wlp3s0 to 255.255.255.255 port 67 interval 17 (xid=0x319147cf)

There is nothing NM can do to recover from this:

the DHCP package flow is generaly: DHCPDISCOVER - DHCPOFFER - DHCPREQUEST - DHCPAcknowledge

The client seems to be doing all correct: searching for a DHCP Server, that one replies with an offer (Server 1.1.1.1), the client 'accepts' the offer and asks for confirmation of the lease... which in turn is not acknowledged by the DHCP - Server => the client has to give up.

Also interesting is that after the server offered the first time but did not send the confirmation back, subsequent discover requests of the client seem unanswered...

Passing the bug to the Network Team / dhcpcd maintainer
Comment 5 Peter Varkoly 2015-02-01 08:10:48 UTC
Thi is a issue of dhcpclient not dhcpcd!
Comment 6 Markus Zimmermann 2015-02-01 12:28:47 UTC
I noticed that the services wickedd-dhcp4.service and wickedd-dhcp6.service are not running. Should I enable them?

And I remembered that I could not connect some years ago with older KDE versions. But this was the fault of the KDE's NM UI. Maybe the old problem is back? https://bugs.kde.org/show_bug.cgi?id=209673

Also I rememberd a problem with the WIFI N mode https://bugs.launchpad.net/ubuntu/+source/linux/+bug/575492 I will try that too. But I tought that these problems were all resolved with newer KDE/NM/Linux versions. What do you think?
Comment 7 Chenzi Cao 2015-02-02 07:31:04 UTC
Hi Marius, would you please kindly help to have a look at here? I'm not sure whether it is right to assign it to you, please feel free to reassign whenever necessary, thank you!
Comment 8 Marius Tomaschewski 2015-02-03 13:17:41 UTC
Please ensure, that there is no wicked + NetworkManager running at same time.

They basically conflict with each other, e.g. both trying to reconfigure the
wpa-supplicant or both starting dhcp clients (wicked provides own one and is
not using dhclient at all) or reconfiguring the same interface.

See also bug#895447 (yast2 lan leaves part of NetworkManager running); more exactly: switching between NetworkManager and wicked does not work properly.

The manual steps are to switch between are:

wicked-to-NM)

  ## stop wicked, any need to stop network
  systemctl stop wickedd.service

  ## disable all wicked services
  systemctl disable wicked.service

  ## enable NetworkManager (creates network.service link)
  systemctl enable NetworkManager.service

  ## start network / NetworkManager
  systemctl start network.service
;;

NM-to-wicked)

  ## stop NetworkManager and all helpers it started...
  systemctl --kill-who=all kill NetworkManager.service
  systemctl stop NetworkManager.service

  ## disable NetworkManager
  systemctl disable NetworkManager.service

  ## enable wicked services (creates network.service link)
  systemctl enable wicked.service

  ## start wickedd damones
  systemctl start wickedd.service

  ## start the network
  systemctl start network.service
;;
Comment 9 Markus Zimmermann 2015-02-03 13:24:04 UTC
Oh, I only use the default settings so NM is running. Wicked is disabled (I checked). It was just a question if this might be a problem!
Comment 10 Marius Tomaschewski 2015-02-03 13:27:29 UTC
Regarding dhcp -- could you start tcpdump and then trigger NM to setup
the interface (start NetworkManager)?

ip link set up wlp3s0
tcpdump -envfi wlp3s0 -s 65535 -U -w dhcp4.out 'udp port 67 or udp port 68'

and trigger the NetworkManager to setup the interface and after a while,
abort tcpdump with Ctrl-C and attach the dhcp4.out file along with the
/var/log/NetworkManager and /var/log/wpa_supplicant.log log files?

Make sense to reset the log files before using e.g.:
  cp -b /dev/null /var/log/NetworkManager
  cp -b /dev/null /var/log/wpa_supplicant.log
Comment 11 Marius Tomaschewski 2015-02-03 13:28:09 UTC
You can do it using wireshark (=GUI) as well.
Comment 12 Marius Tomaschewski 2015-02-03 13:31:17 UTC
(In reply to Markus Zimmermann from comment #9)
> Oh, I only use the default settings so NM is running. Wicked is disabled (I
> checked). It was just a question if this might be a problem!

wicked.service is just a starter -- check if wickedd* services are disabled
(ps ax | grep wickedd) / not running. systemctl disable != systemctl stop.
Comment 13 Markus Zimmermann 2015-02-03 13:36:44 UTC
`ps aux | grep -i wick` does not show a wicked process and `systemctl list-unit-files | grep wick` lists only disabled services.

I will visit my university tomorrow and send you the tcpdum then!
Comment 14 Markus Zimmermann 2015-02-12 13:30:32 UTC
I've been working in another building and funny enough, it worked there. So I guess it has something to do with the APs used here in this building, where I am right known. I will attach the tcpdump output and the logs as you requested.
Comment 15 Markus Zimmermann 2015-02-12 13:31:33 UTC
Created attachment 623076 [details]
TCPdump during a failed connection attempt
Comment 16 Markus Zimmermann 2015-02-12 13:32:10 UTC
Created attachment 623077 [details]
systemd's journal output during a failed connection attempt
Comment 17 Markus Zimmermann 2015-02-12 13:32:35 UTC
Created attachment 623078 [details]
wpa_supplicant log during a failed connection attempt
Comment 18 Markus Zimmermann 2015-02-12 13:51:37 UTC
I remembered that I used to have the same problems with my other Thinkpads e.g. a x200s a very long time ago. The solution was to disable 802.11n.
I just did that
options iwlwifi 11n_disable=1
with my Thinkpad T420 I am using right now. And it seems to work now!

Since Thinkpads are kind of normal at this campus, is there any way to incorporate this into the distribution/upstream projects? Or even detect why this is not working and fix it?
Comment 19 Marius Tomaschewski 2016-05-20 08:33:15 UTC
Pawel,
it seems to be some HW/chipset related thing -- can you take a look?
Comment 20 Marius Tomaschewski 2016-05-20 08:35:29 UTC
Markus,
is it still required to set the parameter on e.g. leap-42.1 (all updates installed) using much more recent kernel (->drivers)?
Comment 21 Markus Zimmermann 2016-05-20 08:40:46 UTC
I still use 13.2 on this particular notebook with 11n disabled. I will try to upgrade in the coming weeks and visit the university again. However, if you like you can close this issue for now since I cannot do anything right now.
Comment 22 Pawel Wieczorkiewicz 2016-05-20 08:59:17 UTC
(In reply to Marius Tomaschewski from comment #19)
> Pawel,
> it seems to be some HW/chipset related thing -- can you take a look?

Yes, this is a typical iwlwifi driver problem of 2015. I had one myself and experienced a lot of similar issues.

The association, authentication and connection works fine. There is no problem with NM nor dhcp client. Simply the first data (which is dhcp discovery here) transferred over the link makes the driver wrongly realize something is wrong (low signal, bad channel or something like that) and that triggers the reason=3 local disconnection.

This is what helped me back then:
1) sudo iwconfig wlan0 power off
2) add wd_disable=0 option to your iwlwifi line in your /etc/modprobe.d/50-iwlwifi.conf (or similar).

But hopefully these problems are history now with more recent drivers for Intel Centrino and alike chipsets. So definitely upgrade and give them a try first.
Comment 23 Pawel Wieczorkiewicz 2016-05-20 15:10:11 UTC
The issue is probably related to one of the following bugs:
https://bugzilla.suse.com/show_bug.cgi?id=900815
https://bugzilla.suse.com/show_bug.cgi?id=900786