Bug 1178015 - kernel-default-5.3.18-lp152.47.2 regression: network deadlock
kernel-default-5.3.18-lp152.47.2 regression: network deadlock
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel
Leap 15.2
x86-64 openSUSE Leap 15.2
: P5 - None : Critical (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-10-22 12:10 UTC by Bernhard Wiedemann
Modified: 2022-01-07 16:55 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg with sysrq-w (122.09 KB, text/plain)
2020-10-22 12:10 UTC, Bernhard Wiedemann
Details
full 2nd dmesg (759.55 KB, text/plain)
2020-10-22 16:40 UTC, Bernhard Wiedemann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Wiedemann 2020-10-22 12:10:12 UTC
Created attachment 842914 [details]
dmesg with sysrq-w

my SUSE Thinkpad T495s
was previously working fine with kernel-default-5.3.18-lp152.44.1.x86_64

got stuck in all network operations after Wifi disassociated at 6567s uptime
(e.g. ip a, ping, wg-quick down)

lspci -nn has:
01:00.0 Network controller [0280]: Intel Corporation Wireless-AC 9260 [8086:2526] (rev 29)
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0e)

and
hwinfo --network also shows the docking-station ethernet:
  SysFS ID: /class/net/eth1
  SysFS Device Link: /devices/pci0000:00/0000:00:08.1/0000:05:00.3/usb3/3-1/3-1.1/3-1.1:1.0
  Hardware Class: network interface
  Model: "Ethernet network interface"
  Driver: "r8152"
  Driver Modules: "r8152"
  Device File: eth1


eth* is not used. But maybe NetworkManager tried to activate eth1, because there is a cable plugged in.
Comment 1 Takashi Iwai 2020-10-22 12:34:01 UTC
I guess this is not really a regression but the intermittent crash due to a long-standing issue of r8152 driver.  See bsc#1174886.

Is it reproduced reliably?  If yes, I'll cook a test kernel package with the possible fix patch.
Comment 2 Takashi Iwai 2020-10-22 12:44:53 UTC
On the second thought, this looks rather like a different code path, at the runtime pm after the disconnection.  Let me check more.
Comment 3 Takashi Iwai 2020-10-22 14:37:00 UTC
I took a quick glance between 5.3.18-44 and 5.3.18-47, but there was no relevant changes for either the net core or the wireless core.  Also no significant changes about r8152 and iwlwifi.  So this doesn't look like a regression from the code POV.
Comment 4 Bernhard Wiedemann 2020-10-22 14:40:55 UTC
I now got a comparable crash with kernel-default-5.3.18-lp152.44.1
when I closed the lid to suspend the laptop while still attached to the docking station

so not sure why it did not crash before.

[ 9822.663872] sysrq: Show Blocked State

 task:kworker/6:0     state:D stack:    0 pid:   49 ppid:     2 flags:0x80004000 
[ 9822.663935] Workqueue: ipv6_addrconf addrconf_verify_work
[ 9822.663937] Call Trace:
[ 9822.663947]  __schedule+0x2fd/0x750
[ 9822.663952]  schedule+0x2f/0xa0
[ 9822.663955]  schedule_preempt_disabled+0xa/0x10
[ 9822.663958]  __mutex_lock.isra.9+0x26d/0x4e0
[ 9822.663962]  ? addrconf_verify_work+0xa/0x20
[ 9822.663963]  addrconf_verify_work+0xa/0x20
[ 9822.663969]  process_one_work+0x1f4/0x3e0
[ 9822.663972]  worker_thread+0x2d/0x3e0
[ 9822.663975]  ? process_one_work+0x3e0/0x3e0 
[ 9822.663977]  kthread+0x10d/0x130
[ 9822.663980]  ? kthread_park+0xa0/0xa0
[ 9822.663982]  ret_from_fork+0x22/0x40 

task:NetworkManager  state:D stack:    0 pid: 1555 ppid:     1 flags:0x00000000 
[ 9822.664127] Call Trace:
[ 9822.664131]  __schedule+0x2fd/0x750
[ 9822.664134]  schedule+0x2f/0xa0 
[ 9822.664138]  rpm_resume+0x107/0x770
[ 9822.664143]  ? wait_woken+0x80/0x80
[ 9822.664145]  rpm_resume+0x571/0x770
[ 9822.664149]  ? d_add+0xd3/0x180
[ 9822.664152]  __pm_runtime_resume+0x47/0x70
[ 9822.664170]  usb_autopm_get_interface+0x1a/0x40 [usbcore]
[ 9822.664179]  rtl8152_get_wol+0x1d/0x90 [r8152]
[ 9822.664185]  dev_ethtool+0x1c16/0x2930
[ 9822.664189]  ? walk_component+0x48/0x300
Comment 5 Takashi Iwai 2020-10-22 15:45:38 UTC
Hm, are those the all stack traces?  If there's something else over r8152_set_mac_address(), it can explain why it deadlocks...
Comment 6 Takashi Iwai 2020-10-22 16:08:42 UTC
In anyway, I'm trying to build a Leap 15.2 kernel with a hopefully-fixes-something patch.  It's being built in OBS home:tiwai:bsc1178015 repo.
Please give it a try later.
Comment 7 Bernhard Wiedemann 2020-10-22 16:40:50 UTC
Created attachment 842939 [details]
full 2nd dmesg
Comment 8 Miroslav Beneš 2022-01-07 14:49:06 UTC
Bernhard, have you had a change to test Takashi's kernel? Does the issue persist in Leap 15.3 (15.2 is not supported anymore)?
Comment 9 Bernhard Wiedemann 2022-01-07 16:55:25 UTC
I had already upgraded to 15.3 some months ago.
I retested it today by plugging an Ethernet cable into the docking station
and it survived 4 suspend+resume cycles, so I count it as fixed.

Chances are, this was never a regression, but only was triggered later because the cable activated the RTL chip in the docking-station.