Bug 1218552 - [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 - Steam Deck
Summary: [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:...
Status: NEW
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Daniel Wagner
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-05 01:38 UTC by ted chang
Modified: 2024-03-13 13:13 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg-pci-eror (462.11 KB, text/plain)
2024-01-05 01:38 UTC, ted chang
Details
kernel-default-6.7.7 with nvme_core.default_ps_max_latency_us=0 (119.05 KB, text/plain)
2024-03-13 13:04 UTC, ted chang
Details
WD_BLACK SN770M 1TB - info dump (2.45 KB, text/plain)
2024-03-13 13:10 UTC, ted chang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ted chang 2024-01-05 01:38:51 UTC
Created attachment 871668 [details]
dmesg-pci-eror

Hi everyone,

Every once in awhile, this Steam Deck print out this error and the system drops me to tty. I do not know why.  I did change the SSD to a WD Black as listed below. I do not know how to reproduce the error.

lsb_release -a
LSB Version:	n/a
Distributor ID:	openSUSE
Description:	openSUSE Tumbleweed
Release:	20231228
Codename:	n/a

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
	Vendor: Valve
	Version: F7A0120
	Release Date: 12/01/2023
	Address: 0xE0000
	Runtime Size: 128 kB
	ROM Size: 16 MB

Information for package kernel-default:
---------------------------------------
Repository     : openSUSE-Tumbleweed-Oss
Name           : kernel-default
Version        : 6.6.7-1.1
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 238.1 MiB
Installed      : Yes
Status         : up-to-date
Source package : kernel-default-6.6.7-1.1.nosrc
Upstream URL   : https://www.kernel.org/
Summary        : The Standard Kernel
Description    : 
    The standard kernel for both uniprocessor and multiprocessor systems.


    Source Timestamp: 2023-12-14 17:36:48 +0000
    GIT Revision: 6869d093e8485475463bc171d23d7c4142fb6fa4
    GIT Branch: stable


=== START OF INFORMATION SECTION ===
Model Number:                       WD_BLACK SN770M 1TB
Serial Number:                      233101400993
Firmware Version:                   731100WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 4a48dc08dc
Local Time is:                      Thu Jan  4 17:36:48 2024 PST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x7e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg Log0_FISE_MI Telmtry_Ar_4


=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    3,267,997 [1.67 TB]
Data Units Written:                 3,844,737 [1.96 TB]
Host Read Commands:                 22,073,106
Host Write Commands:                62,512,660
Controller Busy Time:               79
Power Cycles:                       576
Power On Hours:                     46
Unsafe Shutdowns:                   123



[  402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0
[  402.012342] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  402.012346] nvme 0000:01:00.0:   device [15b7:5042] error status/mask=00000001/0000e000
[  402.012351] nvme 0000:01:00.0:    [ 0] RxErr                 
[  421.302005] usb 3-1.1: new full-speed USB device
Comment 1 ted chang 2024-01-05 16:49:45 UTC
lspci -vnn

00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Root Complex [1022:1645]
	Subsystem: Valve Software Device [1e44:1776]
	Flags: fast devsel

00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] VanGogh IOMMU [1022:1646]
	Subsystem: Valve Software Device [1e44:1776]
	Flags: bus master, fast devsel, latency 0, IRQ -2147483648
	Capabilities: <access denied>

00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
	Flags: fast devsel, IOMMU group 0

00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode])
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Flags: bus master, fast devsel, latency 0, IRQ 28, IOMMU group 1
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: [disabled] [32-bit]
	Memory behind bridge: 80600000-806fffff [size=1M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode])
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Flags: bus master, fast devsel, latency 0, IRQ 29, IOMMU group 2
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: [disabled] [32-bit]
	Memory behind bridge: 80500000-805fffff [size=1M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:01.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode])
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Flags: bus master, fast devsel, latency 0, IRQ 30, IOMMU group 3
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 2000-2fff [size=4K] [16-bit]
	Memory behind bridge: 80400000-804fffff [size=1M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
	Flags: fast devsel, IOMMU group 4

00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode])
	Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648]
	Flags: bus master, fast devsel, latency 0, IRQ 31, IOMMU group 4
	Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
	I/O behind bridge: 1000-1fff [size=4K] [16-bit]
	Memory behind bridge: 80000000-803fffff [size=4M] [32-bit]
	Prefetchable memory behind bridge: f8e0000000-f8f01fffff [size=258M] [32-bit]
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode])
	Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648]
	Flags: bus master, fast devsel, latency 0, IRQ 32, IOMMU group 4
	Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
	I/O behind bridge: [disabled] [32-bit]
	Memory behind bridge: [disabled] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode])
	Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648]
	Flags: bus master, fast devsel, latency 0, IRQ 33, IOMMU group 4
	Bus: primary=00, secondary=06, subordinate=06, sec-latency=0
	I/O behind bridge: [disabled] [32-bit]
	Memory behind bridge: [disabled] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71)
	Subsystem: Valve Software Device [1e44:1776]
	Flags: 66MHz, medium devsel, IOMMU group 5
	Kernel driver in use: piix4_smbus
	Kernel modules: i2c_piix4, sp5100_tco

00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
	Subsystem: Valve Software Device [1e44:1776]
	Flags: bus master, 66MHz, medium devsel, latency 0, IOMMU group 5

00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 0 [1022:1660]
	Flags: fast devsel, IOMMU group 6

00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 1 [1022:1661]
	Flags: fast devsel, IOMMU group 6

00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 2 [1022:1662]
	Flags: fast devsel, IOMMU group 6

00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 3 [1022:1663]
	Flags: fast devsel, IOMMU group 6

00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 4 [1022:1664]
	Flags: fast devsel, IOMMU group 6

00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 5 [1022:1665]
	Flags: fast devsel, IOMMU group 6

00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 6 [1022:1666]
	Flags: fast devsel, IOMMU group 6

00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 7 [1022:1667]
	Flags: fast devsel, IOMMU group 6

01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:5042] (rev 01) (prog-if 02 [NVM Express])
	Subsystem: Sandisk Corp Device [15b7:5042]
	Flags: bus master, fast devsel, latency 0, IRQ 49, IOMMU group 7
	Memory at 80600000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: nvme
	Kernel modules: nvme

02:00.0 SD Host controller [0805]: O2 Micro, Inc. SD/MMC Card Reader Controller [1217:8621] (rev 01) (prog-if 01)
	Subsystem: Valve Software Device [1e44:1776]
	Flags: bus master, fast devsel, latency 0, IRQ 39, IOMMU group 8
	Memory at 80501000 (32-bit, non-prefetchable) [size=4K]
	Memory at 80500000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: <access denied>
	Kernel driver in use: sdhci-pci
	Kernel modules: sdhci_pci

03:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8822CE 802.11ac PCIe Wireless Network Adapter [10ec:c822]
	DeviceName: Broadcom 5762
	Subsystem: AzureWave Device [1a3b:4210]
	Flags: bus master, fast devsel, latency 0, IRQ 73, IOMMU group 9
	I/O ports at 2000 [size=256]
	Memory at 80400000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: <access denied>
	Kernel driver in use: rtw_8822ce
	Kernel modules: rtw88_8822ce

04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405] [1002:163f] (rev ae) (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:0123]
	Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 4
	Memory at f8e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f8f0000000 (64-bit, prefetchable) [size=2M]
	I/O ports at 1000 [size=256]
	Memory at 80300000 (32-bit, non-prefetchable) [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640]
	Subsystem: Valve Software Device [1e44:1776]
	Flags: bus master, fast devsel, latency 0, IRQ 72, IOMMU group 4
	Memory at 803c0000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

04:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP [1022:1649]
	Subsystem: Valve Software Device [1e44:1776]
	Flags: bus master, fast devsel, latency 0, IRQ 35, IOMMU group 4
	Memory at 80200000 (32-bit, non-prefetchable) [size=1M]
	Memory at 803c4000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: ccp
	Kernel modules: ccp

04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] VanGogh USB0 [1022:163a] (prog-if fe [USB Device])
	Subsystem: Valve Software Device [1e44:1776]
	Flags: bus master, fast devsel, latency 0, IRQ 69, IOMMU group 4
	Memory at 80000000 (64-bit, non-prefetchable) [size=1M]
	Capabilities: <access denied>
	Kernel driver in use: dwc3-pci
	Kernel modules: dwc3_pci

04:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] VanGogh USB1 [1022:163b] (prog-if 30 [XHCI])
	Subsystem: Valve Software Device [1e44:1776]
	Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 4
	Memory at 80100000 (64-bit, non-prefetchable) [size=1M]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci

04:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 50)
	Subsystem: Valve Software Device [1e44:1776]
	Flags: bus master, fast devsel, latency 0, IRQ 70, IOMMU group 4
	Memory at 80380000 (32-bit, non-prefetchable) [size=256K]
	Capabilities: <access denied>
	Kernel driver in use: snd_pci_acp5x
	Kernel modules: snd_pci_acp3x, snd_rn_pci_acp3x, snd_pci_acp5x, snd_pci_acp6x, snd_acp_pci, snd_rpl_pci_acp6x, snd_pci_ps, snd_sof_amd_renoir, snd_sof_amd_rembrandt, snd_sof_amd_vangogh

05:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a] (rev 61)
	Subsystem: Valve Software Device [1e44:1776]
	Flags: fast devsel, IOMMU group 4
	Capabilities: <access denied>

06:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
	Subsystem: Valve Software Device [1e44:1776]
	Flags: fast devsel, IOMMU group 4
	Capabilities: <access denied>
Comment 2 Daniel Wagner 2024-01-17 16:29:52 UTC
Random idea, disable the power safe modes on the pci link if they are enabled.

nvme_core.default_ps_max_latency_us=0

Some details on this topic:

https://unix.stackexchange.com/questions/612096/clarifying-nvme-apst-problems-for-linux
Comment 3 ted chang 2024-01-17 23:51:37 UTC
(In reply to Daniel Wagner from comment #2)
> Random idea, disable the power safe modes on the pci link if they are
> enabled.
> 
> nvme_core.default_ps_max_latency_us=0
> 
> Some details on this topic:
> 
> https://unix.stackexchange.com/questions/612096/clarifying-nvme-apst-
> problems-for-linux

Are you looking for something in particular? Are you waiting until I see AER: Corrected error received: 0000:01:00.0 again?
Comment 4 Daniel Wagner 2024-01-18 08:12:05 UTC
No, I just gave you a tip what you could do to workaround the problem.

Looking again on it, the AER message is reporting that an transport error
has been fixed (ECC). This is good, everything is working as expected.

According the upstream PCI maintainers you can disable this message
(feature?) by adding

   "pci=noaer"

as boot parameter. Though I wouldn't recommend to do this.
Comment 5 Daniel Wagner 2024-01-18 09:34:18 UTC
BTW, you could still try to disable the powersafe modes and see if this
makes the ECC go away. Some WDC devices need the NVME_QUIRK_NO_DEEPEST_PS
quirk, maybe this device is one of these.
Comment 6 ted chang 2024-01-18 17:12:01 UTC
(In reply to Daniel Wagner from comment #5)
> BTW, you could still try to disable the powersafe modes and see if this
> makes the ECC go away. Some WDC devices need the NVME_QUIRK_NO_DEEPEST_PS
> quirk, maybe this device is one of these.

Hmmm. I contact WD and they told me I am running the newest firmware. I asked them whether or not they can send my information to their engineers to fix this SSD. I am a direct consumer after all and they did advertise this SSD works on Steam decks.

I might try that quirk in the future.
Comment 7 Daniel Wagner 2024-01-18 17:36:12 UTC
Unfortunately, some manufactures are not so keen on updating consumer 
devices. Don't know if this is the situation here. 

Anyway, you can test the quirk by adding

  nvme_core.default_ps_max_latency_us=0

to kernel command line. If this resolves it, I can spin a kernel patch
and forward it upstream. In this case I would need also the output
of 'nvme id-ctrl /dev/nvme0' please.
Comment 8 ted chang 2024-01-18 18:13:50 UTC
(In reply to Daniel Wagner from comment #7)
> Unfortunately, some manufactures are not so keen on updating consumer 
> devices. Don't know if this is the situation here. 
> 
> Anyway, you can test the quirk by adding
> 
>   nvme_core.default_ps_max_latency_us=0
> 
> to kernel command line. If this resolves it, I can spin a kernel patch
> and forward it upstream. In this case I would need also the output
> of 'nvme id-ctrl /dev/nvme0' please.

Ok. I will try. I will have trouble triggering this bug again because the SDMA0 bug seem to be trigger more often than this pciport error.

On the other note, I was hoping Steam Deck and associative handhelds were an enticing enough market for WD to devote engineers to ensure decent quality. Thanks.

I will run the cmdline and take a look
Comment 9 ted chang 2024-03-13 13:04:15 UTC
Created attachment 873477 [details]
kernel-default-6.7.7  with nvme_core.default_ps_max_latency_us=0

//(In reply to Daniel Wagner from comment #7)
> Unfortunately, some manufactures are not so keen on updating consumer 
> devices. Don't know if this is the situation here. 
> 
> Anyway, you can test the quirk by adding
> 
>   nvme_core.default_ps_max_latency_us=0
> 
> to kernel command line. If this resolves it, I can spin a kernel patch
> and forward it upstream. In this case I would need also the output
> of 'nvme id-ctrl /dev/nvme0' please.

Information for package kernel-default:
---------------------------------------
Repository     : openSUSE-Tumbleweed-Oss
Name           : kernel-default
Version        : 6.7.7-1.1
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 239.6 MiB
Installed      : Yes
Status         : up-to-date
Source package : kernel-default-6.7.7-1.1.nosrc
Upstream URL   : https://www.kernel.org/
Summary        : The Standard Kernel
Description    : 
    The standard kernel for both uniprocessor and multiprocessor systems.


    Source Timestamp: 2024-03-01 13:51:21 +0000
    GIT Revision: 1ff84c539098385746e3fa3aaf975296fb8e6791
    GIT Branch: stable

I am going to remove it from my kernel cmdline args

BOOT_IMAGE=/boot/vmlinuz-6.7.7-1-default root=UUID=85486fcd-23d7-43b7-8be3-ad9a2ff0797a splash=silent mitigations=auto quiet security=apparmor nvme_core.default_ps_max_latency_us=0
Comment 10 ted chang 2024-03-13 13:10:55 UTC
Created attachment 873478 [details]
WD_BLACK SN770M 1TB - info dump