|
Bugzilla – Full Text Bug Listing |
| Summary: | Kernel hard lockup under mild GPU load | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | llyyr <llyyr.public> |
| Component: | Kernel | Assignee: | openSUSE Kernel Bugs <kernel-bugs> |
| Status: | NEW --- | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P5 - None | CC: | llyyr.public, tiwai |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE Tumbleweed | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
llyyr
2024-05-23 13:56:24 UTC
It's quite difficult to debug without any logs, unfortunately. You can try to set up kdump and get the kernel crash dump (at least the dmesg output), too. If it were a kernel panic, the crash dump will be triggered automatically. Other than that, you can trigger manually via magic sysrq-c. (In reply to Takashi Iwai from comment #1) > It's quite difficult to debug without any logs, unfortunately. > You can try to set up kdump and get the kernel crash dump (at least the > dmesg output), too. If it were a kernel panic, the crash dump will be > triggered automatically. Other than that, you can trigger manually via > magic sysrq-c. I bisected it down to amdgpu changes in 6.7-rc1 and reported it upstream here https://gitlab.freedesktop.org/drm/amd/-/issues/3403 Unfortunately I can't bisect it down to a specific commit because amdgpu is broken at random commits in that tree Thanks. As a blind shot (as it's a 6.7 regression), could you try later a test patched kernel in OBS home:tiwai:bsc1219983 repo? Once after the build finishes, the package will appear at http://download.opensuse.org/repositories/home:/tiwai:/bsc1219983/standard/ (In reply to Takashi Iwai from comment #3) > Thanks. > > As a blind shot (as it's a 6.7 regression), could you try later a test > patched kernel in OBS home:tiwai:bsc1219983 repo? Once after the build > finishes, the package will appear at > http://download.opensuse.org/repositories/home:/tiwai:/bsc1219983/standard/ Has the same issue, I get hard lockup. I'd imagine it's related to power management or gpu clocks because these crashes are very similar to what happens when you're running a very unstable overclock and you stress your system a little. Except I'm not overclocking. So the patch didn't seem helping in your case? FWIW, it was a one-line revert mentioned in https://gitlab.freedesktop.org/drm/amd/-/issues/3142 The best you can do for the moment would be to try to catch any kernel crash or such messages and report / track the bug in the upstream gitlab.freedesktop.org Issues. (In reply to Takashi Iwai from comment #5) > So the patch didn't seem helping in your case? > FWIW, it was a one-line revert mentioned in > https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > > The best you can do for the moment would be to try to catch any kernel crash > or such messages and report / track the bug in the upstream > gitlab.freedesktop.org Issues. Actually that patch does work, thanks! I must've booted into the latest kernel instead of picking the one from your branch by accident when trying it out. It's a good news. At least we're heading to the right direction. I can backport the workaround patch to TW, but since the upstream got a significant rewrite of the relevant code, let's check whether it covers your problem at first. I'm building another test kernel with two upstream backports: 2d5bb791e24f43b6b4231b7973009987bbcc9b06 drm/amd/display: Implement update_planes_and_stream_v3 sequence d62d5551dd615f9e488b13595d69b308cd019e16 drm/amd/display: Backup and restore only on full updates It's being built in OBS home:tiwai:bsc1225147 repo. The package will appear at http://download.opensuse.org/repositories/home:/tiwai:/bsc1225147/standard/ Please give it a try later. Meanwhile, you can join to the upstream gitlab.freedesktop.org issues mentioned in comment 2, echoing that the revert helped, too. (In reply to Takashi Iwai from comment #7) > It's a good news. At least we're heading to the right direction. > > I can backport the workaround patch to TW, but since the upstream got a > significant rewrite of the relevant code, let's check whether it covers your > problem at first. > > I'm building another test kernel with two upstream backports: > 2d5bb791e24f43b6b4231b7973009987bbcc9b06 > drm/amd/display: Implement update_planes_and_stream_v3 sequence > d62d5551dd615f9e488b13595d69b308cd019e16 > drm/amd/display: Backup and restore only on full updates > > It's being built in OBS home:tiwai:bsc1225147 repo. The package will appear > at > http://download.opensuse.org/repositories/home:/tiwai:/bsc1225147/standard/ > Please give it a try later. > That does not resolve the issue, I can still reproduce the hard lockup. > Meanwhile, you can join to the upstream gitlab.freedesktop.org issues > mentioned in comment 2, echoing that the revert helped, too. I did https://gitlab.freedesktop.org/drm/amd/-/issues/3142#note_2427275 (In reply to llyyr from comment #8) > (In reply to Takashi Iwai from comment #7) > > It's a good news. At least we're heading to the right direction. > > > > I can backport the workaround patch to TW, but since the upstream got a > > significant rewrite of the relevant code, let's check whether it covers your > > problem at first. > > > > I'm building another test kernel with two upstream backports: > > 2d5bb791e24f43b6b4231b7973009987bbcc9b06 > > drm/amd/display: Implement update_planes_and_stream_v3 sequence > > d62d5551dd615f9e488b13595d69b308cd019e16 > > drm/amd/display: Backup and restore only on full updates > > > > It's being built in OBS home:tiwai:bsc1225147 repo. The package will appear > > at > > http://download.opensuse.org/repositories/home:/tiwai:/bsc1225147/standard/ > > Please give it a try later. > > > That does not resolve the issue, I can still reproduce the hard lockup. Thanks, good to know. Just to be sure, could you try kernel-vanilla package in my OBS home:tiwai:kernel:drm-tip repo? http://download.opensuse.org/repositories/home:/tiwai:/kernel:/drm-tip/standard/ (In reply to Takashi Iwai from comment #9) > Just to be sure, could you try kernel-vanilla package in my OBS > home:tiwai:kernel:drm-tip repo? > > http://download.opensuse.org/repositories/home:/tiwai:/kernel:/drm-tip/ > standard/ Freezes. Only thing that helps is the patch which deletes this line https://github.com/torvalds/linux/blob/c13320499ba0efd93174ef6462ae8a7a2933f6e7/drivers/gpu/drm/amd/display/dc/core/dc_state.c#L323 But it's definitely not ideal OK, then please update the upstream bugtracker info accordingly. It's useful to know that the very latest code still suffers from the same problem. Let's take the workaround temporarily for now until the upstream gets the proper resolution. It's not ideal, but better than sorry. (In reply to Takashi Iwai from comment #11) > OK, then please update the upstream bugtracker info accordingly. It's > useful to know that the very latest code still suffers from the same problem. > 6.10-rc1 should be tagged later today, I'll give that a spin then update upstream. Might also be worth trying out the https://gitlab.freedesktop.org/agd5f/linux/-/tree/amd-staging-drm-next branch? > Let's take the workaround temporarily for now until the upstream gets the > proper resolution. It's not ideal, but better than sorry. Thanks! (In reply to llyyr from comment #12) > (In reply to Takashi Iwai from comment #11) > > OK, then please update the upstream bugtracker info accordingly. It's > > useful to know that the very latest code still suffers from the same problem. > > > 6.10-rc1 should be tagged later today, I'll give that a spin then update > upstream. Might also be worth trying out the > https://gitlab.freedesktop.org/agd5f/linux/-/tree/amd-staging-drm-next > branch? Sure, worth to try out. Gave 6.10-rc1 a shot, got a freeze within minutes. The patch still workarounds the issue though. |