Bugzilla – Bug 1213578
OOPS in amdgpu
Last modified: 2024-06-25 17:50:56 UTC
I get an OOPs with both 5.14.21-150500.55.7-default and also with Takachi's 5.14.21-150500.3.g62ee467-default
Created attachment 868390 [details] the two oops I could find in /var/log/messages
hwinfo --gfxcard 31: PCI 500.0: 0300 VGA compatible controller (VGA) [Created at pci.386] Unique ID: Ddhb.uZbpCsxmrO5 Parent ID: JZZT.nyyq4tDu6x8 SysFS ID: /devices/pci0000:00/0000:00:08.1/0000:05:00.0 SysFS BusID: 0000:05:00.0 Hardware Class: graphics card Model: "ATI Picasso" Vendor: pci 0x1002 "ATI Technologies Inc" Device: pci 0x15d8 "Picasso" SubVendor: pci 0x17aa "Lenovo" SubDevice: pci 0x5127 Revision: 0xd1 Driver: "amdgpu" Driver Modules: "amdgpu" Memory Range: 0xc0000000-0xcfffffff (ro,non-prefetchable) Memory Range: 0xd0000000-0xd01fffff (ro,non-prefetchable) I/O Ports: 0x1000-0x1fff (rw) Memory Range: 0xd0500000-0xd057ffff (rw,non-prefetchable) IRQ: 50 (no events) Module Alias: "pci:v00001002d000015D8sv000017AAsd00005127bc03sc00i00" Driver Info #0: Driver Status: amdgpu is active Driver Activation Cmd: "modprobe amdgpu" Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #25 (PCI bridge) Primary display adapter: #31 # hwinfo --monitor 35: None 00.0: 10002 LCD Monitor [Created at monitor.125] Unique ID: rdCR.mQXMLz_WQq5 Parent ID: Ddhb.uZbpCsxmrO5 Hardware Class: monitor Model: "AUO LCD Monitor" Vendor: AUO "AUO" Device: eisa 0x573d Serial ID: "0" Resolution: 1920x1080@60Hz Size: 309x174 mm Year of Manufacture: 2018 Week of Manufacture: 0 Detailed Timings #0: Resolution: 1920x1080 Horizontal: 1920 1936 1952 2080 (+16 +32 +160) -hsync Vertical: 1080 1083 1088 1142 (+3 +8 +62) -vsync Frequencies: 142.60 MHz, 68.56 kHz, 60.03 Hz Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #25 (VGA compatible controller) 36: None 01.0: 10002 LCD Monitor [Created at monitor.125] Unique ID: wkFv.zdQ3vHfjlr1 Parent ID: Ddhb.uZbpCsxmrO5 Hardware Class: monitor Model: "DELL U2419H" Vendor: DEL "DELL" Device: eisa 0x4148 "DELL U2419H" Serial ID: "5ZC7SS2" Resolution: 720x400@70Hz Resolution: 640x480@60Hz Resolution: 640x480@75Hz Resolution: 800x600@60Hz Resolution: 800x600@75Hz Resolution: 1024x768@60Hz Resolution: 1024x768@75Hz Resolution: 1280x1024@75Hz Resolution: 1152x864@75Hz Resolution: 1280x1024@60Hz Resolution: 1600x900@60Hz Resolution: 1920x1080@60Hz Size: 527x296 mm Year of Manufacture: 2019 Week of Manufacture: 44 Detailed Timings #0: Resolution: 1920x1080 Horizontal: 1920 2008 2052 2200 (+88 +132 +280) +hsync Vertical: 1080 1084 1089 1125 (+4 +9 +45) +vsync Frequencies: 148.50 MHz, 67.50 kHz, 60.00 Hz Driver Info #0: Max. Resolution: 1920x1080 Vert. Sync Range: 56-76 Hz Hor. Sync Range: 30-83 kHz Bandwidth: 148 MHz Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #25 (VGA compatible controller) 37: None 02.0: 10002 LCD Monitor [Created at monitor.125] Unique ID: +rIN.8N48X7gRWVA Parent ID: Ddhb.uZbpCsxmrO5 Hardware Class: monitor Model: "DELL U2414H" Vendor: DEL "DELL" Device: eisa 0xa0b2 "DELL U2414H" Serial ID: "X4J717CQ18UL" Resolution: 720x400@70Hz Resolution: 640x480@60Hz Resolution: 640x480@75Hz Resolution: 800x600@60Hz Resolution: 800x600@75Hz Resolution: 1024x768@60Hz Resolution: 1024x768@75Hz Resolution: 1280x1024@75Hz Resolution: 1152x864@75Hz Resolution: 1280x1024@60Hz Resolution: 1600x900@60Hz Resolution: 1600x1200@60Hz Resolution: 1920x1080@60Hz Size: 527x296 mm Year of Manufacture: 2017 Week of Manufacture: 52 Detailed Timings #0: Resolution: 1920x1080 Horizontal: 1920 2008 2052 2200 (+88 +132 +280) +hsync Vertical: 1080 1084 1089 1125 (+4 +9 +45) +vsync Frequencies: 148.50 MHz, 67.50 kHz, 60.00 Hz Driver Info #0: Max. Resolution: 1920x1080 Vert. Sync Range: 56-76 Hz Hor. Sync Range: 30-83 kHz Bandwidth: 148 MHz Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #25 (VGA compatible controller)
Thanks. This looks like the upstream issue https://gitlab.freedesktop.org/drm/amd/-/issues/2314 I'm building yet another test kernel with some backports in OBS home:tiwai:bsc1213578. Please give it a try later once after the build finishes.
And, I'm building yet two more test kernels in OBS home:tiwai:bsc1213578-2 and home:tiwai:bsc1213578-3 repos. The first one is another upstream fix, and please test it in anyway to check whether it gives more regression or not. The latter one is a downstream fix for NULL dereferences, and this should work around the Oops, at least. If the previous two kernels don't work, please check this one. If this is the only one that works, I'll add this workaround for the next update.
Thanks, Takashi! Waiting for the builds now...
Booted kernel-default-5.14.21-150500.1.1.g0e39bed.x86_64 from https://build.opensuse.org/repositories/home:tiwai:bsc1213578 - crashed when starting X11. No oops after reboot. Now to the next one..
I meant: No OOPS in /var/log/messages found
Created attachment 868394 [details] dmesg from home:tiwai:bsc1213578-2 home:tiwai:bsc1213578-2 crashed when connecting external monitors, attaching dmesg output.
Created attachment 868395 [details] dmesg from home:tiwai:bsc1213578-3 home:tiwai:bsc1213578-3 produces an OOPS as well, see dmesg attachment. BUT: I report this now from the system with two external monitors attached, so it recovered. I booted up without external monitors and then connected them. $ uname -a Linux t495s 5.14.21-150500.1.g06f3d0e-default #1 SMP PREEMPT_DYNAMIC Mon Jul 24 08:36:58 UTC 2023 (06f3d0e) x86_64 x86_64 x86_64 GNU/Linux
(In reply to Andreas Jaeger from comment #10) > Created attachment 868395 [details] > dmesg from home:tiwai:bsc1213578-3 > > home:tiwai:bsc1213578-3 produces an OOPS as well, see dmesg attachment. Those are no real crash but just kernel WARNINGs from ASSERT() macros. To be fixed, of course. > BUT: I report this now from the system with two external monitors attached, > so it recovered. I booted up without external monitors and then connected > them. > > $ uname -a > Linux t495s 5.14.21-150500.1.g06f3d0e-default #1 SMP PREEMPT_DYNAMIC Mon Jul > 24 08:36:58 UTC 2023 (06f3d0e) x86_64 x86_64 x86_64 GNU/Linux So, how is the behavior of *-3 kernel except for those kernel warnings? Does it still show other breakage?
The latest kernel had initial a network connection problem and gnome-shell started without any extensions which I was later able to enable. After that I worked fine for an hour until I rebooted. I don't know whether the network and gnome-shell problems were related to the kernel. Let me try that kernel again...
Rebooted, all fine. Will use it for the next 2 hours and report if any problems arise. No OOPS/assert - booted this time with external monitors attached directly. uname -a Linux t495s 5.14.21-150500.1.g06f3d0e-default #1 SMP PREEMPT_DYNAMIC Mon Jul 24 08:36:58 UTC 2023 (06f3d0e) x86_64 x86_64 x86_64 GNU/Linux
Is there more bug to be fixed with the latest SLE15-SP5 kernel? (At best check with the kernel in OBS Kernel:SLE15-SP5 repo.) If yes, could you elaborate how to trigger it?
Ok, download kernel from OBS Kernel:SLE15-SP5, uname -a reports: Linux t495s 5.14.21-150500.158.g6eb8d8a-default #1 SMP PREEMPT_DYNAMIC Thu Aug 3 12:29:06 UTC 2023 (6eb8d8a) x86_64 x86_64 x86_64 GNU/Linux Booted up fine, I'll run it now for some time and will then report back. Thanks, Takashi!
Looking still fine!
OK, then let's close now. Feel free to reopen if you hit the same bug (but maybe better to open another entry as it can be a different problem).
SUSE-SU-2023:3302-1: An update that solves 28 vulnerabilities, contains two features and has 115 fixes can now be installed. Category: security (important) Bug References: 1150305, 1187829, 1193629, 1194869, 1206418, 1207129, 1207894, 1207948, 1208788, 1210335, 1210565, 1210584, 1210627, 1210780, 1210825, 1210853, 1211014, 1211131, 1211243, 1211738, 1211811, 1211867, 1212051, 1212256, 1212265, 1212301, 1212445, 1212456, 1212502, 1212525, 1212603, 1212604, 1212685, 1212766, 1212835, 1212838, 1212842, 1212846, 1212848, 1212861, 1212869, 1212892, 1212901, 1212905, 1212961, 1213010, 1213011, 1213012, 1213013, 1213014, 1213015, 1213016, 1213017, 1213018, 1213019, 1213020, 1213021, 1213024, 1213025, 1213032, 1213034, 1213035, 1213036, 1213037, 1213038, 1213039, 1213040, 1213041, 1213059, 1213061, 1213087, 1213088, 1213089, 1213090, 1213092, 1213093, 1213094, 1213095, 1213096, 1213098, 1213099, 1213100, 1213102, 1213103, 1213104, 1213105, 1213106, 1213107, 1213108, 1213109, 1213110, 1213111, 1213112, 1213113, 1213114, 1213116, 1213134, 1213167, 1213205, 1213206, 1213226, 1213233, 1213245, 1213247, 1213252, 1213258, 1213259, 1213263, 1213264, 1213272, 1213286, 1213287, 1213304, 1213417, 1213493, 1213523, 1213524, 1213533, 1213543, 1213578, 1213585, 1213586, 1213588, 1213601, 1213620, 1213632, 1213653, 1213705, 1213713, 1213715, 1213747, 1213756, 1213759, 1213777, 1213810, 1213812, 1213856, 1213857, 1213863, 1213867, 1213870, 1213871, 1213872 CVE References: CVE-2022-40982, CVE-2023-0459, CVE-2023-1829, CVE-2023-20569, CVE-2023-20593, CVE-2023-21400, CVE-2023-2156, CVE-2023-2166, CVE-2023-2430, CVE-2023-2985, CVE-2023-3090, CVE-2023-31083, CVE-2023-3111, CVE-2023-3117, CVE-2023-31248, CVE-2023-3212, CVE-2023-3268, CVE-2023-3389, CVE-2023-3390, CVE-2023-35001, CVE-2023-3567, CVE-2023-3609, CVE-2023-3611, CVE-2023-3776, CVE-2023-3812, CVE-2023-38409, CVE-2023-3863, CVE-2023-4004 Jira References: PED-4718, PED-4758 Sources used: openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5-RT_Update_3-1-150500.11.5.1, kernel-syms-rt-5.14.21-150500.13.11.1, kernel-source-rt-5.14.21-150500.13.11.1 SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5-RT_Update_3-1-150500.11.5.1 SUSE Real Time Module 15-SP5 (src): kernel-syms-rt-5.14.21-150500.13.11.1, kernel-source-rt-5.14.21-150500.13.11.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2023:3311-1: An update that solves 15 vulnerabilities and has 27 fixes can now be installed. Category: security (important) Bug References: 1206418, 1207129, 1207948, 1210627, 1210780, 1210825, 1211131, 1211738, 1211811, 1212445, 1212502, 1212604, 1212766, 1212901, 1213167, 1213272, 1213287, 1213304, 1213417, 1213578, 1213585, 1213586, 1213588, 1213601, 1213620, 1213632, 1213653, 1213713, 1213715, 1213747, 1213756, 1213759, 1213777, 1213810, 1213812, 1213856, 1213857, 1213863, 1213867, 1213870, 1213871, 1213872 CVE References: CVE-2022-40982, CVE-2023-0459, CVE-2023-20569, CVE-2023-21400, CVE-2023-2156, CVE-2023-2166, CVE-2023-31083, CVE-2023-3268, CVE-2023-3567, CVE-2023-3609, CVE-2023-3611, CVE-2023-3776, CVE-2023-38409, CVE-2023-3863, CVE-2023-4004 Sources used: openSUSE Leap 15.5 (src): kernel-syms-5.14.21-150500.55.19.1, kernel-default-base-5.14.21-150500.55.19.1.150500.6.6.4, kernel-livepatch-SLE15-SP5_Update_3-1-150500.11.3.4, kernel-source-5.14.21-150500.55.19.1, kernel-obs-qa-5.14.21-150500.55.19.1, kernel-obs-build-5.14.21-150500.55.19.1 Basesystem Module 15-SP5 (src): kernel-default-base-5.14.21-150500.55.19.1.150500.6.6.4, kernel-source-5.14.21-150500.55.19.1 Development Tools Module 15-SP5 (src): kernel-obs-build-5.14.21-150500.55.19.1, kernel-syms-5.14.21-150500.55.19.1, kernel-source-5.14.21-150500.55.19.1 SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5_Update_3-1-150500.11.3.4 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2023:3376-1: An update that solves 15 vulnerabilities and has 27 fixes can now be installed. Category: security (important) Bug References: 1206418, 1207129, 1207948, 1210627, 1210780, 1210825, 1211131, 1211738, 1211811, 1212445, 1212502, 1212604, 1212766, 1212901, 1213167, 1213272, 1213287, 1213304, 1213417, 1213578, 1213585, 1213586, 1213588, 1213601, 1213620, 1213632, 1213653, 1213713, 1213715, 1213747, 1213756, 1213759, 1213777, 1213810, 1213812, 1213856, 1213857, 1213863, 1213867, 1213870, 1213871, 1213872 CVE References: CVE-2022-40982, CVE-2023-0459, CVE-2023-20569, CVE-2023-21400, CVE-2023-2156, CVE-2023-2166, CVE-2023-31083, CVE-2023-3268, CVE-2023-3567, CVE-2023-3609, CVE-2023-3611, CVE-2023-3776, CVE-2023-38409, CVE-2023-3863, CVE-2023-4004 Sources used: openSUSE Leap 15.5 (src): kernel-syms-azure-5.14.21-150500.33.14.1, kernel-source-azure-5.14.21-150500.33.14.1 Public Cloud Module 15-SP5 (src): kernel-syms-azure-5.14.21-150500.33.14.1, kernel-source-azure-5.14.21-150500.33.14.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.