Bugzilla – Bug 1213491
Realtek ethernet adpater stops working after update to 6.4.2 and 6.4.3
Last modified: 2024-06-25 17:50:25 UTC
With kernel version 6.4.2 and 6.4.3 ethernet adapter card RTL8111/8168/8411 on the laptop stops to working after sometime, around 1 hour. No issue with the same controller on desktop pc. Seems to be related to a power management issue with aspm. There is already a bug track upstream: https://bugzilla.kernel.org/show_bug.cgi?id=217596 Reboot the pc temporary solves the problem re-enabling the ethernet, but it will come back. The only workaround is to not use 6.4.2 and 6.4.3 kernel
The bug you suggested in the upstream bug tracker should have hit already with 6.4 release, so it might be a different bug. In anyway, please try pcie_aspm=force boot option with 6.4.3 kernel. This should be a workaround for that bug.
(In reply to Takashi Iwai from comment #1) > The bug you suggested in the upstream bug tracker should have hit already > with 6.4 release, so it might be a different bug. > > In anyway, please try pcie_aspm=force boot option with 6.4.3 kernel. This > should be a workaround for that bug. As said in the upstream bug, this problem is still present with 6.4.3, and the pcie_aspm=force seems to cause a sensible performance degrade.
Well, it's still doubtful why 6.4.2 worked, then. The buggy commit 2ab19de62d67e403105ba860971e5ff0d511ad15 r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll is already included in 6.4 release. So, 6.4.2 should have hit the same problem if that's the cause. And, more puzzling is that there is really only few changes between 6.4.2 and 6.4.3 kernels. Most of them are only about the VM fixes, and irrelevant with the Realtek Ethernet driver. I asked to test with pcie_aspm=force option for confirming whether the above is the cause or not. It's of course no solution, per se.
(In reply to Takashi Iwai from comment #3) > Well, it's still doubtful why 6.4.2 worked, then. The buggy commit > 2ab19de62d67e403105ba860971e5ff0d511ad15 > r8169: remove ASPM restrictions now that ASPM is disabled during NAPI > poll > is already included in 6.4 release. So, 6.4.2 should have hit the same > problem if that's the cause. > > And, more puzzling is that there is really only few changes between 6.4.2 > and 6.4.3 kernels. Most of them are only about the VM fixes, and irrelevant > with the Realtek Ethernet driver. > > I asked to test with pcie_aspm=force option for confirming whether the above > is the cause or not. It's of course no solution, per se. Just to clarify, 6.4.2 doesn't work neither. Tried 6.4.3 with pcie_aspm=force: unexpected outcome. I've no found performance degradation, but it stopped again, after around 3 hours. Not sure if the timing is something relevant or not. It is clear that something in 6.4.x kernel broke the ethernet adapter
Ah, then I totally misunderstood the description. The workaround is to go back to 6.3.x... (In reply to Ferdinando Vivacqua from comment #4) > Tried 6.4.3 with pcie_aspm=force: unexpected outcome. > I've no found performance degradation, but it stopped again, after around 3 > hours. Not sure if the timing is something relevant or not. Interesting. To verify whether it's the same problem, I'm building a test kernel with the revert of the commit. It's being built in OBS home:tiwai:bsc1213491 repo. Once after the build finishes (takes an hour or so), the package will be available at: http://download.opensuse.org/repositories/home:/tiwai:/bsc1213491/standard/ Could you give it a try later?
(In reply to Takashi Iwai from comment #5) > Ah, then I totally misunderstood the description. The workaround is to go > back to 6.3.x... > > (In reply to Ferdinando Vivacqua from comment #4) > > Tried 6.4.3 with pcie_aspm=force: unexpected outcome. > > I've no found performance degradation, but it stopped again, after around 3 > > hours. Not sure if the timing is something relevant or not. > > Interesting. > > To verify whether it's the same problem, I'm building a test kernel with the > revert of the commit. It's being built in OBS home:tiwai:bsc1213491 repo. > Once after the build finishes (takes an hour or so), the package will be > available at: > http://download.opensuse.org/repositories/home:/tiwai:/bsc1213491/standard/ > > Could you give it a try later? Is it the kernel kernel-default-6.4.4-1.1.g903492f.x86_64.rpm? Not able to boot, as it stops with error: ..../efi/linux.c:168 you need to load the kernel first
If Secure Boot is enabled on your BIOS, turn it off and retest.
(In reply to Takashi Iwai from comment #7) > If Secure Boot is enabled on your BIOS, turn it off and retest. It seems it does work! After more than 3 hours of working without problems.
OK, thanks, then this is indeed the same problem as in the upstream bugzilla. Let's see whether there will be any development in the upstream. If nothing happens, I'll put a temporary revert patch as a regression workaround.
(In reply to Takashi Iwai from comment #9) > OK, thanks, then this is indeed the same problem as in the upstream bugzilla. > > Let's see whether there will be any development in the upstream. If nothing > happens, I'll put a temporary revert patch as a regression workaround. Thank you!
The upstream took three fix commits regarding r8169, landed in Linus tree now: 162d626f3013215b82b6514ca14f20932c7ccce5 r8169: fix ASPM-related problem for chip version 42 and 43 cf2ffdea0839398cb0551762af7f5efb0a6e0fea r8169: revert 2ab19de62d67 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll") e31a9fedc7d8d80722b19628e66fcb5a36981780 Revert "r8169: disable ASPM during NAPI poll" I backported those to TW stable branch.
... and another test kernel is being built in OBS home:tiwai:bsc1213491-2 repo. You can test it later once after the build finishes.
thanks Takashi. in the mean time I'm testing 6.4.1-1.g6fd2851-default and after 10h uptime all fine. tomorrow I will try to test the new build.
this night I still hit a problem with the first custom kernel. I yet have to test your second build. here the log: Jul 24 01:28:53 alfred kernel: ------------[ cut here ]------------ Jul 24 01:28:53 alfred kernel: NETDEV WATCHDOG: eno1 (r8169): transmit queue 0 timed out 6437 ms Jul 24 01:28:53 alfred kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x21e/0x230 Jul 24 01:28:53 alfred kernel: Modules linked in: ccm af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink qrtr msr ext4 nls_iso8859_1 nls_cp437 mbcache vfat jbd2 fat iwlmvm snd_hda_codec_hdmi snd_sof_pci_intel_icl snd_sof_intel_hda_common mac80211 snd_hda_codec_realtek soundwire_intel soundwire_cadence snd_sof_intel_hda_mlink snd_hda_codec_generic snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp ledtrig_audio snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core libarc4 snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_intel_dspcfg kvm_intel snd_intel_sdw_acpi snd_hda_codec spi_pxa2xx_platform dw_dmac spi_nor snd_hda_core ee1004 mei_pxp mei_hdcp mtd kvm intel_rapl_msr snd_hwdep iwlwifi snd_pcm btusb irqbypass processor_thermal_device_pci_legacy Jul 24 01:28:53 alfred kernel: btrtl snd_timer processor_thermal_device btbcm processor_thermal_rfim pcspkr btintel processor_thermal_mbox i2c_i801 r8169 snd btmtk processor_thermal_rapl wmi_bmof cfg80211 bluetooth soundcore intel_rapl_common spi_intel_pci i2c_smbus realtek int340x_thermal_zone spi_intel mei_me mdio_devres libphy intel_lpss_pci intel_lpss ecdh_generic joydev mei rfkill idma64 intel_soc_dts_iosf fan tiny_power_button thermal acpi_tad intel_pmc_core acpi_pad button fuse efi_pstore configfs dmi_sysfs ip_tables x_tables uas usb_storage hid_logitech_hidpp hid_logitech_dj hid_generic crct10dif_pclmul crc32_pclmul usbhid polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 i915 nvme xhci_pci xhci_pci_renesas xhci_hcd aesni_intel crypto_simd cryptd sdhci_pci cqhci wdat_wdt i2c_algo_bit sdhci drm_buddy nvme_core drm_display_helper usbcore mmc_core cec rc_core ttm video wmi pinctrl_jasperlake btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs Jul 24 01:28:53 alfred kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U 6.4.1-1.g6fd2851-default #1 openSUSE Tumbleweed (unreleased) a74e0f0a6765b1b2b400108eb36c99233f07085b Jul 24 01:28:53 alfred kernel: Hardware name: Intel(R) Client Systems NUC11ATKC4/NUC11ATBC4, BIOS ATJSLCPX.0039.2023.0221.1502 02/21/2023 Jul 24 01:28:53 alfred kernel: RIP: 0010:dev_watchdog+0x21e/0x230 Jul 24 01:28:53 alfred kernel: Code: ff ff ff 48 89 df c6 05 d5 85 fd 00 01 e8 9a 3e fa ff 45 89 f8 44 89 f1 48 89 de 48 89 c2 48 c7 c7 90 90 8a b8 e8 12 1d 5f ff <0f> 0b e9 2d ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 Jul 24 01:28:53 alfred kernel: RSP: 0018:ffffa568c017cea0 EFLAGS: 00010286 Jul 24 01:28:53 alfred kernel: RAX: 0000000000000000 RBX: ffff88fa8af00000 RCX: 000000000000083f Jul 24 01:28:53 alfred kernel: RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f Jul 24 01:28:53 alfred kernel: RBP: ffff88fa8af004c8 R08: 0000000000000000 R09: ffffa568c017cd48 Jul 24 01:28:53 alfred kernel: R10: 0000000000000003 R11: ffffffffb8b58cc8 R12: ffff88fa81798000 Jul 24 01:28:53 alfred kernel: R13: ffff88fa8af0041c R14: 0000000000000000 R15: 0000000000001925 Jul 24 01:28:53 alfred kernel: FS: 0000000000000000(0000) GS:ffff88fdefe80000(0000) knlGS:0000000000000000 Jul 24 01:28:53 alfred kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 24 01:28:53 alfred kernel: CR2: 00007f983affdd58 CR3: 0000000438c36000 CR4: 0000000000350ee0 Jul 24 01:28:53 alfred kernel: Call Trace: Jul 24 01:28:53 alfred kernel: <IRQ> Jul 24 01:28:53 alfred kernel: ? dev_watchdog+0x21e/0x230 Jul 24 01:28:53 alfred kernel: ? __warn+0x81/0x130 Jul 24 01:28:53 alfred kernel: ? dev_watchdog+0x21e/0x230 Jul 24 01:28:53 alfred kernel: ? report_bug+0x171/0x1a0 Jul 24 01:28:53 alfred kernel: ? native_write_msr+0xa/0x30 Jul 24 01:28:53 alfred kernel: ? handle_bug+0x3c/0x80 Jul 24 01:28:53 alfred kernel: ? exc_invalid_op+0x17/0x70 Jul 24 01:28:53 alfred kernel: ? asm_exc_invalid_op+0x1a/0x20 Jul 24 01:28:53 alfred kernel: ? dev_watchdog+0x21e/0x230 Jul 24 01:28:53 alfred kernel: ? __pfx_dev_watchdog+0x10/0x10 Jul 24 01:28:53 alfred kernel: ? __pfx_dev_watchdog+0x10/0x10 Jul 24 01:28:53 alfred kernel: call_timer_fn+0x24/0x130 Jul 24 01:28:53 alfred kernel: __run_timers.part.0+0x1d8/0x280 Jul 24 01:28:53 alfred kernel: ? __hrtimer_run_queues+0x121/0x2b0 Jul 24 01:28:53 alfred kernel: ? ktime_get+0x39/0xa0 Jul 24 01:28:53 alfred kernel: run_timer_softirq+0x2a/0x50 Jul 24 01:28:53 alfred kernel: __do_softirq+0xc7/0x2a5 Jul 24 01:28:53 alfred kernel: __irq_exit_rcu+0xae/0xe0 Jul 24 01:28:53 alfred kernel: sysvec_apic_timer_interrupt+0x72/0x90 Jul 24 01:28:53 alfred kernel: </IRQ> Jul 24 01:28:53 alfred kernel: <TASK> Jul 24 01:28:53 alfred kernel: asm_sysvec_apic_timer_interrupt+0x1a/0x20 Jul 24 01:28:53 alfred kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x440 Jul 24 01:28:53 alfred kernel: Code: 1a 35 48 ff e8 d5 f1 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 03 42 47 ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d Jul 24 01:28:53 alfred kernel: RSP: 0018:ffffa568c012fe90 EFLAGS: 00000246 Jul 24 01:28:53 alfred kernel: RAX: ffff88fdefeba040 RBX: ffff88fdefec5700 RCX: 0000000000000000 Jul 24 01:28:53 alfred kernel: RDX: 0000000000000001 RSI: fffffff64d245137 RDI: 0000000000000000 Jul 24 01:28:53 alfred kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000401a41a4 Jul 24 01:28:53 alfred kernel: R10: ffff88fdefeb8a44 R11: 000000000000bc50 R12: ffffffffb8c25c40 Jul 24 01:28:53 alfred kernel: R13: 00002cef36771d18 R14: 0000000000000003 R15: 0000000000000000 Jul 24 01:28:53 alfred kernel: cpuidle_enter+0x2d/0x40 Jul 24 01:28:53 alfred kernel: do_idle+0x20d/0x270 Jul 24 01:28:53 alfred kernel: cpu_startup_entry+0x1d/0x20 Jul 24 01:28:53 alfred kernel: start_secondary+0x12e/0x150 Jul 24 01:28:53 alfred kernel: secondary_startup_64_no_verify+0x10b/0x10b Jul 24 01:28:53 alfred kernel: </TASK> Jul 24 01:28:53 alfred kernel: ---[ end trace 0000000000000000 ]--- Jul 24 01:28:55 alfred kernel: pcieport 0000:00:1c.7: Data Link Layer Link Active not set in 1000 msec Jul 24 01:28:55 alfred kernel: r8169 0000:02:00.0 eno1: Can't reset secondary PCI bus, detach NIC
(In reply to Takashi Iwai from comment #12) > ... and another test kernel is being built in OBS home:tiwai:bsc1213491-2 > repo. > You can test it later once after the build finishes. Hi Takashi, sorry for being late. Do you still need I test this second custom kernel? However, in the upstream bug tracker seems that we need to wait the 6.5 branch, right? If so, can we anticipate the fixing reverting patch in openSUSE TW kernel? thank you!
The all revert and fix patches for r8169 have been already merged in TW stable git branch, and it'll be eventually included in TW release later. So, could you rather confirm that the kernel in OBS Kernel:stable repo works? If the bug isn't still fixed there, we'll need to report to the upstream.
(In reply to Takashi Iwai from comment #17) > The all revert and fix patches for r8169 have been already merged in TW > stable git branch, and it'll be eventually included in TW release later. > > So, could you rather confirm that the kernel in OBS Kernel:stable repo works? > If the bug isn't still fixed there, we'll need to report to the upstream. Ok, I'm going to test the kernel-default-6.4.6-3.1.g74a8144.x86_64.rpm and let you know. thanks!
(In reply to Takashi Iwai from comment #17) > The all revert and fix patches for r8169 have been already merged in TW > stable git branch, and it'll be eventually included in TW release later. > > So, could you rather confirm that the kernel in OBS Kernel:stable repo works? > If the bug isn't still fixed there, we'll need to report to the upstream. Hi! After more than 30 hours without any issue, I think your kernel kernel-default-6.4.6-3.1.g74a8144.x86_64.rpm is ok. It works!
OK, let's close now.