Bugzilla – Bug 1218552
[ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 - Steam Deck
Last modified: 2024-03-13 13:13:10 UTC
Created attachment 871668 [details] dmesg-pci-eror Hi everyone, Every once in awhile, this Steam Deck print out this error and the system drops me to tty. I do not know why. I did change the SSD to a WD Black as listed below. I do not know how to reproduce the error. lsb_release -a LSB Version: n/a Distributor ID: openSUSE Description: openSUSE Tumbleweed Release: 20231228 Codename: n/a Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: Valve Version: F7A0120 Release Date: 12/01/2023 Address: 0xE0000 Runtime Size: 128 kB ROM Size: 16 MB Information for package kernel-default: --------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : kernel-default Version : 6.6.7-1.1 Arch : x86_64 Vendor : openSUSE Installed Size : 238.1 MiB Installed : Yes Status : up-to-date Source package : kernel-default-6.6.7-1.1.nosrc Upstream URL : https://www.kernel.org/ Summary : The Standard Kernel Description : The standard kernel for both uniprocessor and multiprocessor systems. Source Timestamp: 2023-12-14 17:36:48 +0000 GIT Revision: 6869d093e8485475463bc171d23d7c4142fb6fa4 GIT Branch: stable === START OF INFORMATION SECTION === Model Number: WD_BLACK SN770M 1TB Serial Number: 233101400993 Firmware Version: 731100WD PCI Vendor/Subsystem ID: 0x15b7 IEEE OUI Identifier: 0x001b44 Total NVM Capacity: 1,000,204,886,016 [1.00 TB] Unallocated NVM Capacity: 0 Controller ID: 0 NVMe Version: 1.4 Number of Namespaces: 1 Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 001b44 4a48dc08dc Local Time is: Thu Jan 4 17:36:48 2024 PST Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x00df): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify Log Page Attributes (0x7e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg Log0_FISE_MI Telmtry_Ar_4 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 36 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 3,267,997 [1.67 TB] Data Units Written: 3,844,737 [1.96 TB] Host Read Commands: 22,073,106 Host Write Commands: 62,512,660 Controller Busy Time: 79 Power Cycles: 576 Power On Hours: 46 Unsafe Shutdowns: 123 [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 [ 402.012342] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) [ 402.012346] nvme 0000:01:00.0: device [15b7:5042] error status/mask=00000001/0000e000 [ 402.012351] nvme 0000:01:00.0: [ 0] RxErr [ 421.302005] usb 3-1.1: new full-speed USB device
lspci -vnn 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Root Complex [1022:1645] Subsystem: Valve Software Device [1e44:1776] Flags: fast devsel 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] VanGogh IOMMU [1022:1646] Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ -2147483648 Capabilities: <access denied> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632] Flags: fast devsel, IOMMU group 0 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453] Flags: bus master, fast devsel, latency 0, IRQ 28, IOMMU group 1 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: 80600000-806fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453] Flags: bus master, fast devsel, latency 0, IRQ 29, IOMMU group 2 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: 80500000-805fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:01.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453] Flags: bus master, fast devsel, latency 0, IRQ 30, IOMMU group 3 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 2000-2fff [size=4K] [16-bit] Memory behind bridge: 80400000-804fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632] Flags: fast devsel, IOMMU group 4 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] Flags: bus master, fast devsel, latency 0, IRQ 31, IOMMU group 4 Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 I/O behind bridge: 1000-1fff [size=4K] [16-bit] Memory behind bridge: 80000000-803fffff [size=4M] [32-bit] Prefetchable memory behind bridge: f8e0000000-f8f01fffff [size=258M] [32-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] Flags: bus master, fast devsel, latency 0, IRQ 32, IOMMU group 4 Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: [disabled] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] Flags: bus master, fast devsel, latency 0, IRQ 33, IOMMU group 4 Bus: primary=00, secondary=06, subordinate=06, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: [disabled] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71) Subsystem: Valve Software Device [1e44:1776] Flags: 66MHz, medium devsel, IOMMU group 5 Kernel driver in use: piix4_smbus Kernel modules: i2c_piix4, sp5100_tco 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, 66MHz, medium devsel, latency 0, IOMMU group 5 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 0 [1022:1660] Flags: fast devsel, IOMMU group 6 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 1 [1022:1661] Flags: fast devsel, IOMMU group 6 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 2 [1022:1662] Flags: fast devsel, IOMMU group 6 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 3 [1022:1663] Flags: fast devsel, IOMMU group 6 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 4 [1022:1664] Flags: fast devsel, IOMMU group 6 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 5 [1022:1665] Flags: fast devsel, IOMMU group 6 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 6 [1022:1666] Flags: fast devsel, IOMMU group 6 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 7 [1022:1667] Flags: fast devsel, IOMMU group 6 01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:5042] (rev 01) (prog-if 02 [NVM Express]) Subsystem: Sandisk Corp Device [15b7:5042] Flags: bus master, fast devsel, latency 0, IRQ 49, IOMMU group 7 Memory at 80600000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: nvme Kernel modules: nvme 02:00.0 SD Host controller [0805]: O2 Micro, Inc. SD/MMC Card Reader Controller [1217:8621] (rev 01) (prog-if 01) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 39, IOMMU group 8 Memory at 80501000 (32-bit, non-prefetchable) [size=4K] Memory at 80500000 (32-bit, non-prefetchable) [size=2K] Capabilities: <access denied> Kernel driver in use: sdhci-pci Kernel modules: sdhci_pci 03:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8822CE 802.11ac PCIe Wireless Network Adapter [10ec:c822] DeviceName: Broadcom 5762 Subsystem: AzureWave Device [1a3b:4210] Flags: bus master, fast devsel, latency 0, IRQ 73, IOMMU group 9 I/O ports at 2000 [size=256] Memory at 80400000 (64-bit, non-prefetchable) [size=64K] Capabilities: <access denied> Kernel driver in use: rtw_8822ce Kernel modules: rtw88_8822ce 04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405] [1002:163f] (rev ae) (prog-if 00 [VGA controller]) Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:0123] Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 4 Memory at f8e0000000 (64-bit, prefetchable) [size=256M] Memory at f8f0000000 (64-bit, prefetchable) [size=2M] I/O ports at 1000 [size=256] Memory at 80300000 (32-bit, non-prefetchable) [size=512K] Capabilities: <access denied> Kernel driver in use: amdgpu Kernel modules: amdgpu 04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640] Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 72, IOMMU group 4 Memory at 803c0000 (32-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 04:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP [1022:1649] Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 35, IOMMU group 4 Memory at 80200000 (32-bit, non-prefetchable) [size=1M] Memory at 803c4000 (32-bit, non-prefetchable) [size=8K] Capabilities: <access denied> Kernel driver in use: ccp Kernel modules: ccp 04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] VanGogh USB0 [1022:163a] (prog-if fe [USB Device]) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 69, IOMMU group 4 Memory at 80000000 (64-bit, non-prefetchable) [size=1M] Capabilities: <access denied> Kernel driver in use: dwc3-pci Kernel modules: dwc3_pci 04:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] VanGogh USB1 [1022:163b] (prog-if 30 [XHCI]) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 4 Memory at 80100000 (64-bit, non-prefetchable) [size=1M] Capabilities: <access denied> Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 04:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 50) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 70, IOMMU group 4 Memory at 80380000 (32-bit, non-prefetchable) [size=256K] Capabilities: <access denied> Kernel driver in use: snd_pci_acp5x Kernel modules: snd_pci_acp3x, snd_rn_pci_acp3x, snd_pci_acp5x, snd_pci_acp6x, snd_acp_pci, snd_rpl_pci_acp6x, snd_pci_ps, snd_sof_amd_renoir, snd_sof_amd_rembrandt, snd_sof_amd_vangogh 05:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a] (rev 61) Subsystem: Valve Software Device [1e44:1776] Flags: fast devsel, IOMMU group 4 Capabilities: <access denied> 06:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a] Subsystem: Valve Software Device [1e44:1776] Flags: fast devsel, IOMMU group 4 Capabilities: <access denied>
Random idea, disable the power safe modes on the pci link if they are enabled. nvme_core.default_ps_max_latency_us=0 Some details on this topic: https://unix.stackexchange.com/questions/612096/clarifying-nvme-apst-problems-for-linux
(In reply to Daniel Wagner from comment #2) > Random idea, disable the power safe modes on the pci link if they are > enabled. > > nvme_core.default_ps_max_latency_us=0 > > Some details on this topic: > > https://unix.stackexchange.com/questions/612096/clarifying-nvme-apst- > problems-for-linux Are you looking for something in particular? Are you waiting until I see AER: Corrected error received: 0000:01:00.0 again?
No, I just gave you a tip what you could do to workaround the problem. Looking again on it, the AER message is reporting that an transport error has been fixed (ECC). This is good, everything is working as expected. According the upstream PCI maintainers you can disable this message (feature?) by adding "pci=noaer" as boot parameter. Though I wouldn't recommend to do this.
BTW, you could still try to disable the powersafe modes and see if this makes the ECC go away. Some WDC devices need the NVME_QUIRK_NO_DEEPEST_PS quirk, maybe this device is one of these.
(In reply to Daniel Wagner from comment #5) > BTW, you could still try to disable the powersafe modes and see if this > makes the ECC go away. Some WDC devices need the NVME_QUIRK_NO_DEEPEST_PS > quirk, maybe this device is one of these. Hmmm. I contact WD and they told me I am running the newest firmware. I asked them whether or not they can send my information to their engineers to fix this SSD. I am a direct consumer after all and they did advertise this SSD works on Steam decks. I might try that quirk in the future.
Unfortunately, some manufactures are not so keen on updating consumer devices. Don't know if this is the situation here. Anyway, you can test the quirk by adding nvme_core.default_ps_max_latency_us=0 to kernel command line. If this resolves it, I can spin a kernel patch and forward it upstream. In this case I would need also the output of 'nvme id-ctrl /dev/nvme0' please.
(In reply to Daniel Wagner from comment #7) > Unfortunately, some manufactures are not so keen on updating consumer > devices. Don't know if this is the situation here. > > Anyway, you can test the quirk by adding > > nvme_core.default_ps_max_latency_us=0 > > to kernel command line. If this resolves it, I can spin a kernel patch > and forward it upstream. In this case I would need also the output > of 'nvme id-ctrl /dev/nvme0' please. Ok. I will try. I will have trouble triggering this bug again because the SDMA0 bug seem to be trigger more often than this pciport error. On the other note, I was hoping Steam Deck and associative handhelds were an enticing enough market for WD to devote engineers to ensure decent quality. Thanks. I will run the cmdline and take a look
Created attachment 873477 [details] kernel-default-6.7.7 with nvme_core.default_ps_max_latency_us=0 //(In reply to Daniel Wagner from comment #7) > Unfortunately, some manufactures are not so keen on updating consumer > devices. Don't know if this is the situation here. > > Anyway, you can test the quirk by adding > > nvme_core.default_ps_max_latency_us=0 > > to kernel command line. If this resolves it, I can spin a kernel patch > and forward it upstream. In this case I would need also the output > of 'nvme id-ctrl /dev/nvme0' please. Information for package kernel-default: --------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : kernel-default Version : 6.7.7-1.1 Arch : x86_64 Vendor : openSUSE Installed Size : 239.6 MiB Installed : Yes Status : up-to-date Source package : kernel-default-6.7.7-1.1.nosrc Upstream URL : https://www.kernel.org/ Summary : The Standard Kernel Description : The standard kernel for both uniprocessor and multiprocessor systems. Source Timestamp: 2024-03-01 13:51:21 +0000 GIT Revision: 1ff84c539098385746e3fa3aaf975296fb8e6791 GIT Branch: stable I am going to remove it from my kernel cmdline args BOOT_IMAGE=/boot/vmlinuz-6.7.7-1-default root=UUID=85486fcd-23d7-43b7-8be3-ad9a2ff0797a splash=silent mitigations=auto quiet security=apparmor nvme_core.default_ps_max_latency_us=0
Created attachment 873478 [details] WD_BLACK SN770M 1TB - info dump