Bug 1179971 - iwlwifi - "Microcode SW error detected"
iwlwifi - "Microcode SW error detected"
Status: RESOLVED WORKSFORME
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Major (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-12-12 02:41 UTC by S. B.
Modified: 2022-02-19 15:50 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
relevant dmesg (41.63 KB, text/plain)
2020-12-12 02:45 UTC, S. B.
Details
hwinfo - Thinkpad T530 all Intel hardware (448.33 KB, text/plain)
2020-12-12 02:46 UTC, S. B.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description S. B. 2020-12-12 02:41:55 UTC
Hi, I'm experiencing relatively frequent issues with my Intel WiFI connection dropping, at least once a day. (Lenovo T530 laptop) Just now I had it drop about 5 times in a row during a Zoom meeting. By "drop" I mean that NetworkManager shows no connection for about 10 seconds, and it has to re-associate. Of course multiple things could be going wrong given that it's WiFI, but my router is a high quality Mikrotik device giving a strong signal to my laptop, and I don't have major problems with WiFI spectrum congestion in this area. Also, the router log is showing corresponding events like "disconnected, ok", whereas in the past when I had problems with the actual connection quality it would throw an error indicating there was "major data loss". I also found some nasty looking errors with traces in dmesg that seem to correspond to the timeframes of the episondes, although I can't be sure because I put my machine to sleep every night and therefore the time stamps are wrong. I pulled the most interesting bits out and changed the MAC addresses. I should also mention that I usually have a Bluetooth headphone connected. Thanks for the help!
Comment 1 S. B. 2020-12-12 02:45:26 UTC
Created attachment 844411 [details]
relevant dmesg
Comment 2 S. B. 2020-12-12 02:46:08 UTC
Created attachment 844412 [details]
hwinfo - Thinkpad T530 all Intel hardware
Comment 3 Takashi Iwai 2020-12-12 08:00:19 UTC
I'm afraid that this is a kind of long-standing problem the upstream has little cared.  As shown in the dmesg output, this is likely the firmware problem, and Intel won't provide the new firmware for old chips.

While we should report this again to the upstream bug tracker (bugzilla.kernel.org), you may try some iwlwifi module options that are related with the power management.
Also, downgrading the firmware might be worth to try, too.  It's iwlwifi-6000g2a, and currently iwlwifi-6000g2a-6.ucode is used.  So try to remove
  /lib/firmware/iwlwifi-6000g2a-6.ucode (or ucode.xz)
file and reboot.  (Maybe safer to move to another place or rename the file instead of removing it, so that you can recover later, too).
Comment 4 S. B. 2020-12-12 15:03:34 UTC
Hi Takashi, thanks for your extremely helpful and polite response as always.

I think I'll try renaming the iwlwifi-6000g2a-6.ucode.xz for now. Hopefully I'll notice a difference after a few long days of work, this bug is just random enough so as to make troubleshooting difficult, but frequent enough to seriously interrupt important work.

As for power management tweaks, I also suspected that could be a factor, as I use TLP which includes pretty aggressive power management tweaks across the board. Would you recommend anything in particular? I've seen recommendations for:
> options iwlwifi 11n_disable=1 swcrypto=0 power_save=0 #also: 11n_disable=8
> options iwlmvm power_scheme=1 
> options iwlwifi uapsd_disable=1 

As well as:
> options iwlwifi swcrypto=0 
> options iwlwifi power_save=0
> options iwlmvm power_scheme=1 
> options iwlwifi uapsd_disable=1

Or simply:
> options iwlwifi power_save=0

I'm not sure about the 11n_disable option codes or if that has anything to do with this?

There are indeed quite a few reports of this on bugzilla.kernel.org, should I add on to an existing report or open a new one?
- https://bugzilla.kernel.org/show_bug.cgi?id=207409
- https://bugzilla.kernel.org/show_bug.cgi?id=205387
- https://bugzilla.kernel.org/show_bug.cgi?id=208425
Comment 5 Takashi Iwai 2020-12-13 08:12:01 UTC
(In reply to S. B. from comment #4)
> As for power management tweaks, I also suspected that could be a factor, as
> I use TLP which includes pretty aggressive power management tweaks across
> the board. Would you recommend anything in particular? I've seen
> recommendations for:
> > options iwlwifi 11n_disable=1 swcrypto=0 power_save=0 #also: 11n_disable=8
> > options iwlmvm power_scheme=1 
> > options iwlwifi uapsd_disable=1 
>
> As well as:
> > options iwlwifi swcrypto=0 
> > options iwlwifi power_save=0
> > options iwlmvm power_scheme=1 
> > options iwlwifi uapsd_disable=1
> 
> Or simply:
> > options iwlwifi power_save=0
> 
> I'm not sure about the 11n_disable option codes or if that has anything to
> do with this?

Honestly speaking, I have no idea what really matters.  I would try the bottom line first, i.e. disable all suspicious ones.

> There are indeed quite a few reports of this on bugzilla.kernel.org, should
> I add on to an existing report or open a new one?
> - https://bugzilla.kernel.org/show_bug.cgi?id=207409
> - https://bugzilla.kernel.org/show_bug.cgi?id=205387
> - https://bugzilla.kernel.org/show_bug.cgi?id=208425

It's fine to create a new report unless you find the exactly same bug with the same hardware (Lenovo T530).  You can join to other bugs meanwhile, and close yours later as a duplicate.
Comment 6 Héctor Sanjuán 2021-02-09 17:49:21 UTC
I also have this problem, also with a Lenovo (X1 Carbon gen. 7), and also with a Mikrotik AP. 

But for me this is new or at least it is happening way more often now (every few minutes), particularly on when network is very busy.

Apart from the "Microcode SW error detected. Restarting 0x0." I also see quite a bit of:

"kernel: iwlwifi 0000:00:14.3: Unhandled alg: 0xc040071b"

I will try the firmware downgrade and report back.
Comment 7 S. B. 2021-02-09 21:13:45 UTC
I seem to have fixed that particular error with the following in /etc/modprobe.d/iwlwifi.conf 

> options iwlwifi bt_coex_active=0 swcrypto=1 11n_disable=8

It appears that the issue was also being complicated by some wireless interference causing "extensive data loss" errors in RouterOS, but I managed to switch to a more free channel.
Comment 8 Héctor Sanjuán 2021-02-09 23:19:43 UTC
> I seem to have fixed that particular error with the following in 
/etc/modprobe.d/iwlwifi.conf 

>> options iwlwifi bt_coex_active=0 swcrypto=1 11n_disable=8

Does not seem to help. I did not enable bt_coex_active, as there is no BT device nearby.

> It appears that the issue was also being complicated by some wireless interference causing "extensive data loss" errors in RouterOS, but I managed to switch to a more free channel.

I am on a very clean spectrum.
Comment 9 Héctor Sanjuán 2021-02-19 08:58:31 UTC
With 5.10.16, along with a firmware update (kernel-firmware-iwlwifi-20210208-1.1), I don't see issues anymore.

I am connected on 5Ghz (the issue before only appeared on this band), I have removed all iwlwifi options from modprobe and I have not seen any errors in a while, where before I'd suffer them every one or two minutes.

The last kernel release has indeed included fixes to iwlwifi:

https://lwn.net/Articles/846116/
Comment 10 Héctor Sanjuán 2021-02-19 23:06:58 UTC
(In reply to Héctor Sanjuán from comment #9)
> With 5.10.16, along with a firmware update
> (kernel-firmware-iwlwifi-20210208-1.1), I don't see issues anymore.
> 
> I am connected on 5Ghz (the issue before only appeared on this band), I have
> removed all iwlwifi options from modprobe and I have not seen any errors in
> a while, where before I'd suffer them every one or two minutes.
> 
> The last kernel release has indeed included fixes to iwlwifi:
> 
> https://lwn.net/Articles/846116/

Woa, scratch all that. I saw the issue again. However it appears to have manifested after I disconnected the laptop from the AC, so perhaps it might be related to powersaving behaviour. I'll have to re-test with the powersave-related options. :(
Comment 11 S. B. 2021-02-19 23:29:20 UTC
@Héctor Sanjuán -- Interesting... I'm going to be trying the opposite of what you did. I disabled my iwlwifi.conf tweak from comment #7 , and I'm now using a new Mikrotik router with a 5Ghz network, whereas when I opened this bug report I was having major issues with my only 2.4Ghz network. I'll report back if anything gets better or worse, but this bug is frustratingly random as you've noted.
Comment 12 Miroslav Beneš 2022-02-18 11:06:47 UTC
Is there anything new to report or discuss? It has been a while and the latest TW kernels might have helped.
Comment 13 Héctor Sanjuán 2022-02-18 18:54:48 UTC
(In reply to Miroslav Beneš from comment #12)
> Is there anything new to report or discuss? It has been a while and the
> latest TW kernels might have helped.

Indeed. For me the issue happened when connected to the 5Ghz AP.

I have switched back to 5Ghz today and I have not seen a single driver error logged, and connection has worked all day. I have no special modprobe settings (things mentioned here did not work for me back in the day).

It would seem that this has been "fixed", at least for me.

I'm on 5.16.8-1-default from TW with kernel-firmware-iwlwifi-20220119-1.1.noarch.
Comment 14 S. B. 2022-02-19 15:50:08 UTC
Yeah, I'm inclined to think that this is a multi-faceted issue, but primarily complicated by the ever-worsening congestion on the 2.4GHz band. Everything seems pretty much rock-solid for me on 5GHz, so I think I'll close this bug. Feel free to re-open it if there are other users still affected by a true software bug.