Bugzilla – Bug 1178015
kernel-default-5.3.18-lp152.47.2 regression: network deadlock
Last modified: 2022-01-07 16:55:25 UTC
Created attachment 842914 [details]
dmesg with sysrq-w
my SUSE Thinkpad T495s
was previously working fine with kernel-default-5.3.18-lp152.44.1.x86_64
got stuck in all network operations after Wifi disassociated at 6567s uptime
(e.g. ip a, ping, wg-quick down)
lspci -nn has:
01:00.0 Network controller : Intel Corporation Wireless-AC 9260 [8086:2526] (rev 29)
03:00.0 Ethernet controller : Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0e)
hwinfo --network also shows the docking-station ethernet:
SysFS ID: /class/net/eth1
SysFS Device Link: /devices/pci0000:00/0000:00:08.1/0000:05:00.3/usb3/3-1/3-1.1/3-1.1:1.0
Hardware Class: network interface
Model: "Ethernet network interface"
Driver Modules: "r8152"
Device File: eth1
eth* is not used. But maybe NetworkManager tried to activate eth1, because there is a cable plugged in.
I guess this is not really a regression but the intermittent crash due to a long-standing issue of r8152 driver. See bsc#1174886.
Is it reproduced reliably? If yes, I'll cook a test kernel package with the possible fix patch.
On the second thought, this looks rather like a different code path, at the runtime pm after the disconnection. Let me check more.
I took a quick glance between 5.3.18-44 and 5.3.18-47, but there was no relevant changes for either the net core or the wireless core. Also no significant changes about r8152 and iwlwifi. So this doesn't look like a regression from the code POV.
I now got a comparable crash with kernel-default-5.3.18-lp152.44.1
when I closed the lid to suspend the laptop while still attached to the docking station
so not sure why it did not crash before.
[ 9822.663872] sysrq: Show Blocked State
task:kworker/6:0 state:D stack: 0 pid: 49 ppid: 2 flags:0x80004000
[ 9822.663935] Workqueue: ipv6_addrconf addrconf_verify_work
[ 9822.663937] Call Trace:
[ 9822.663947] __schedule+0x2fd/0x750
[ 9822.663952] schedule+0x2f/0xa0
[ 9822.663955] schedule_preempt_disabled+0xa/0x10
[ 9822.663958] __mutex_lock.isra.9+0x26d/0x4e0
[ 9822.663962] ? addrconf_verify_work+0xa/0x20
[ 9822.663963] addrconf_verify_work+0xa/0x20
[ 9822.663969] process_one_work+0x1f4/0x3e0
[ 9822.663972] worker_thread+0x2d/0x3e0
[ 9822.663975] ? process_one_work+0x3e0/0x3e0
[ 9822.663977] kthread+0x10d/0x130
[ 9822.663980] ? kthread_park+0xa0/0xa0
[ 9822.663982] ret_from_fork+0x22/0x40
task:NetworkManager state:D stack: 0 pid: 1555 ppid: 1 flags:0x00000000
[ 9822.664127] Call Trace:
[ 9822.664131] __schedule+0x2fd/0x750
[ 9822.664134] schedule+0x2f/0xa0
[ 9822.664138] rpm_resume+0x107/0x770
[ 9822.664143] ? wait_woken+0x80/0x80
[ 9822.664145] rpm_resume+0x571/0x770
[ 9822.664149] ? d_add+0xd3/0x180
[ 9822.664152] __pm_runtime_resume+0x47/0x70
[ 9822.664170] usb_autopm_get_interface+0x1a/0x40 [usbcore]
[ 9822.664179] rtl8152_get_wol+0x1d/0x90 [r8152]
[ 9822.664185] dev_ethtool+0x1c16/0x2930
[ 9822.664189] ? walk_component+0x48/0x300
Hm, are those the all stack traces? If there's something else over r8152_set_mac_address(), it can explain why it deadlocks...
In anyway, I'm trying to build a Leap 15.2 kernel with a hopefully-fixes-something patch. It's being built in OBS home:tiwai:bsc1178015 repo.
Please give it a try later.
Created attachment 842939 [details]
full 2nd dmesg
Bernhard, have you had a change to test Takashi's kernel? Does the issue persist in Leap 15.3 (15.2 is not supported anymore)?
I had already upgraded to 15.3 some months ago.
I retested it today by plugging an Ethernet cable into the docking station
and it survived 4 suspend+resume cycles, so I count it as fixed.
Chances are, this was never a regression, but only was triggered later because the cable activated the RTL chip in the docking-station.