Bugzilla – Bug 1228093
[amdgpu] Secondary monitor does not come up with 6.10
Last modified: 2024-10-05 05:25:04 UTC
I bisected the issue to: commit 8b2cb32cf0c613fd937ebb49a331798985f50826 Author: Hersen Wu <hersenxs.wu@amd.com> Date: Mon Mar 11 18:18:34 2024 -0400 drm/amd/display: FEC overhead should be checked once for mst slot nums Now going to revert in stable temporarily and report to upstream.
The monitor simply does not come up in wayland-plasma6 (it does in console). It appears as if it was there (windows open there and mouse cursor can go there), but the monitor is DPMS off. There is no difference in dmesg regarding [drm]. Reverting the above commit on the top of 6.10 makes it work again. git bisect log for reference: > # bad: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10 > # good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9 > git bisect start 'v6.10' 'v6.9' '--' 'drivers/gpu/drm/amd/' > # bad: [27e718ac8b8194d13eee5738c4d3fd247736186e] drm/amd/display: fix disable otg wa logic in DCN316 > git bisect bad 27e718ac8b8194d13eee5738c4d3fd247736186e > # good: [20fd14460f45a01b9ec63aa7b12e6c3c66e54fa7] drm/amdgpu: Fix 'fw_name' buffer size to prevent truncations in amdgpu_mes_init_microcode > git bisect good 20fd14460f45a01b9ec63aa7b12e6c3c66e54fa7 > # bad: [14f9db4271ef5c78ae87237af844f03fb192d139] drm/amd/display: Enable DTBCLK DTO earlier in the sequence > git bisect bad 14f9db4271ef5c78ae87237af844f03fb192d139 > # good: [1c5c36530a573de1a4b647b7d8c36f3b298e60ed] drm/amd/display: Set DCN351 BB and IP the same as DCN35 > git bisect good 1c5c36530a573de1a4b647b7d8c36f3b298e60ed > # good: [d045f4ad7700c271fa1278b78ef7722f833a8068] drm/amd/swsmu: Update smu v14.0.0 headers to be 14.0.1 compatible > git bisect good d045f4ad7700c271fa1278b78ef7722f833a8068 > # good: [029faefb7302f1079173410697b0e14d2e56e19a] drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2 > git bisect good 029faefb7302f1079173410697b0e14d2e56e19a > # bad: [b7a1a0ef12b81957584fef7b61e2d5ec049c7209] drm/amd/amdgpu: add pipe1 hardware support > git bisect bad b7a1a0ef12b81957584fef7b61e2d5ec049c7209 > # bad: [60df5628144b59d5876f8ceac624a7661c336665] drm/amd/display: handle invalid connector indices > git bisect bad 60df5628144b59d5876f8ceac624a7661c336665 > # bad: [8b2cb32cf0c613fd937ebb49a331798985f50826] drm/amd/display: FEC overhead should be checked once for mst slot nums > git bisect bad 8b2cb32cf0c613fd937ebb49a331798985f50826 > # first bad commit: [8b2cb32cf0c613fd937ebb49a331798985f50826] drm/amd/display: FEC overhead should be checked once for mst slot nums
The external monitor is connected via Lenovo dock (via Thunderbolt) by an HDMI cable. The card in question: 64:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1 [1002:15bf] (rev dd) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device [17aa:50da] Flags: bus master, fast devsel, latency 0, IRQ 57, IOMMU group 16 Memory at 2400000000 (64-bit, prefetchable) [size=256M] Memory at 78000000 (64-bit, prefetchable) [size=2M] I/O ports at 1000 [size=256] Memory at 78500000 (32-bit, non-prefetchable) [size=512K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Legacy Endpoint, IntMsgNum 0 Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=4 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270] Secondary PCI Express Capabilities: [2a0] Access Control Services Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [410] Physical Layer 16.0 GT/s <?> Capabilities: [450] Lane Margining at the Receiver Kernel driver in use: amdgpu
Hi, I've been made aware of this ticket in a forum topic I opened. Hopefully I can help but my issue is linked to Kernel 6.9.3+ and not 6.10 and comes up already at boot. Topic: https://forums.opensuse.org/t/system-crashes-when-second-daisy-chained-monitor-is-attached-with-amd-gpu-with-kernel-6-9-3/176886 System: Kernel: 6.9.7-1-default arch: x86_64 bits: 64 compiler: gcc v: 13.3.0 clocksource: tsc avail: hpet,acpi_pm parameters: initrd=\opensuse-tumbleweed\6.9.7-1-default\initrd-78cac3084ea8018dc0df08f7fd3831a49a0967c4 root=UUID=[REDACTED] splash=silent quiet security=apparmor mitigations=auto systemd.machine_id=[REDACTED] Desktop: KDE Plasma v: 6.1.2 tk: Qt v: N/A info: frameworks v: 6.3.0 wm: kwin_x11 tools: avail: xscreensaver vt: 2 dm: SDDM Distro: openSUSE Tumbleweed 20240712 Graphics: Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] vendor: XFX driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4 speed: 16 GT/s lanes: 16 ports: active: DP-4 empty: DP-1, DP-2, DP-3, DP-5, HDMI-A-1, Writeback-1 bus-ID: 2d:00.0 chip-ID: 1002:73df class-ID: 0300 Display: x11 server: X.Org v: 21.1.12 with: Xwayland v: 24.1.0 compositor: kwin_x11 driver: X: loaded: modesetting unloaded: fbdev,vesa dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1 Screen-1: 0 s-res: 2560x1440 s-dpi: 96 s-size: 677x381mm (26.65x15.00") s-diag: 777mm (30.58") Monitor-1: DP-4 model: HP Z27u G3 serial: <filter> built: 2021 res: 2560x1440 hz: 60 dpi: 109 gamma: 1.2 size: 597x336mm (23.5x13.23") diag: 685mm (27") ratio: 16:9 modes: max: 2560x1440 min: 720x400 API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi device: 1 drv: swrast surfaceless: drv: radeonsi x11: drv: radeonsi inactive: gbm,wayland API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.1.3 glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 6700 XT (radeonsi navi22 LLVM 18.1.8 DRM 3.57 6.9.7-1-default) device-ID: 1002:73df memory: 11.72 GiB unified: no API: Vulkan v: 1.3.283 layers: 5 device: 0 type: discrete-gpu name: AMD Radeon RX 6700 XT (RADV NAVI22) driver: N/A device-ID: 1002:73df surfaces: xcb,xlib
(In reply to Daniel Schemp from comment #3) > Topic: > https://forums.opensuse.org/t/system-crashes-when-second-daisy-chained- > monitor-is-attached-with-amd-gpu-with-kernel-6-9-3/176886 That'd be a different issue. This bug is in 6.10 only. Please create a new bug. Ideally at https://gitlab.freedesktop.org/drm/amd/-/issues, so that upstream devs are made aware of the issue (or you will be pointed to some preexisting bug).
(In reply to Jiri Slaby from comment #4) > (In reply to Daniel Schemp from comment #3) > > Topic: > > https://forums.opensuse.org/t/system-crashes-when-second-daisy-chained- > > monitor-is-attached-with-amd-gpu-with-kernel-6-9-3/176886 > > That'd be a different issue. This bug is in 6.10 only. Please create a new > bug. Ideally at https://gitlab.freedesktop.org/drm/amd/-/issues, so that > upstream devs are made aware of the issue (or you will be pointed to some > preexisting bug). And it might be worth testing 6.10 first. E.g. from: https://download.opensuse.org/repositories/Kernel:/stable/standard/ (this bug is fixed there)
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1188940 Factory / kernel-source
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1189502 Factory / kernel-source
Created attachment 876262 [details] patch
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1189731 Factory / kernel-source
Pushed to stable + master.
(In reply to Jiri Slaby from comment #8) > Created attachment 876262 [details] > patch I think this is what crashed my 6.11-rc1 from master on the T14s gen3 laptop with a dock and external monitor. See the oops: https://paste.opensuse.org/pastes/d8a33a929c71 excerpt: BUG: unable to handle page fault for address: 00000000000012b8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 UID: 0 PID: 2541 Comm: Xorg.bin Not tainted 6.11.0-rc1-1.gc7e21a2-default #1 openSUSE Tumbleweed (unreleased) 59ccf8feca6c7> Hardware name: LENOVO 21CRS0K63K/21CRS0K63K, BIOS R22ET70W (1.40 ) 03/21/2024 RIP: 0010:compute_mst_dsc_configs_for_link+0x577/0xa90 [amdgpu] Code: 63 56 20 48 8d 2c c7 48 b8 cf f7 53 e3 a5 9b c4 20 48 69 d2 ee 03 00 00 48 c1 ea 03 48 f7 e2 49 8b 45 40 48 89 d1 48 c1 e9 0> RSP: 0018:ffffa37d4121f698 EFLAGS: 00010216 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000dad5a0 RDX: 000000000dad5a00 RSI: 000000000047747f RDI: ffffa37d4121f9e8 RBP: ffffa37d4121f9e8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 R13: ffffa37d4121f790 R14: ffffa37d4121f748 R15: 0000000000000000 FS: 00007f1ad194edc0(0000) GS:ffff88e6aed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000012b8 CR3: 000000011675a000 CR4: 0000000000750ef0 objdump tells me (with RIP being 340627): /usr/src/debug/kernel-default-6.11~rc1/linux-6.11-rc1/linux-obj/../include/linux/math64.h:29 340620: 48 89 d1 mov %rdx,%rcx 340623: 48 c1 e9 04 shr $0x4,%rcx kbps_to_peak_pbn(): /usr/src/debug/kernel-default-6.11~rc1/linux-6.11-rc1/linux-obj/../drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:814 340627: 80 b8 b8 12 00 00 00 cmpb $0x0,0x12b8(%rax) 34062e: 0f 85 00 00 00 00 jne 340634 <compute_mst_dsc_configs_for_link+0x584> 340630: R_X86_64_PC32 .text.unlikely+0x285f7 /usr/src/debug/kernel-default-6.11~rc1/linux-6.11-rc1/linux-obj/../drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:819 340634: 48 c1 e1 06 shl $0x6,%rcx That's kbps_to_peak_pbn() on line if (aconnector->is_synaptics_cascaded), RAX is zero and pahole tells me is_synaptics_cascaded is indeed at offset 0x12b8. So aconnector is null. I don't know yet which of the several callsites to kbps_to_peak_pbn() this is.
(In reply to Vlastimil Babka from comment #11) > I don't know yet which of the several callsites to kbps_to_peak_pbn() this > is. The closest preceding one in the objdump (unless it's too shuffled) is try_disable_dsc(): /usr/src/debug/kernel-default-6.11~rc1/linux-6.11-rc1/linux-obj/../drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:1048 vars[next_index].pbn = kbps_to_peak_pbn(params[next_index].bw_range.stream_kbps, params[i].aconnector);
Let's resort back to the revert which upstream is likely going to do: https://lore.kernel.org/all/CO6PR12MB5489857D91F3CDC7F7517D02FCB02@CO6PR12MB5489.namprd12.prod.outlook.com/
Pushed to master+stable.
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1191566 Factory / kernel-source
Mainline commit 338567d17627 ("drm/amd/display: Fix MST BW calculation Regression") in 6.11-rc4 which is supposed to revert commit 8b2cb32cf0c6 looks similar to this patch but there are differences. Someone who was affected by this issue should probably check that everything is OK with current master branch snapshot (based on v6.11-rc4).
(In reply to Michal Kubeček from comment #16) > Mainline commit 338567d17627 ("drm/amd/display: Fix MST BW calculation > Regression") in 6.11-rc4 which is supposed to revert commit 8b2cb32cf0c6 > looks similar to this patch but there are differences. Someone who was > affected by this issue should probably check that everything is OK with > current master branch snapshot (based on v6.11-rc4). I was Reported-by in there but got no notification, weird. The change appears to be wrong: -+ vars[next_index].pbn = kbps_to_peak_pbn(params[next_index].bw_range.max_kbps, fec_overhead_multiplier_x1000); ++ vars[next_index].pbn = kbps_to_peak_pbn(params[next_index].bw_range.stream_kbps, fec_overhead_multiplier_x1000);
(In reply to Jiri Slaby from comment #17) > The change appears to be wrong: > -+ vars[next_index].pbn = > kbps_to_peak_pbn(params[next_index].bw_range.max_kbps, > fec_overhead_multiplier_x1000); > ++ vars[next_index].pbn = > kbps_to_peak_pbn(params[next_index].bw_range.stream_kbps, > fec_overhead_multiplier_x1000); Fixed exactly by: https://lore.kernel.org/all/20240815224525.3077505-13-Roman.Li@amd.com/
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1197685 Factory / kernel-source
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1198865 Factory / kernel-source
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1202559 Factory / kernel-source
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1203029 Factory / kernel-source
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1203745 Factory / kernel-source
This is an autogenerated message for OBS integration: This bug (1228093) was mentioned in https://build.opensuse.org/request/show/1205774 Factory / kernel-source