Bugzilla – Bug 1215470
amdgpu no-retry page fault
Last modified: 2023-10-25 13:53:45 UTC
Created attachment 869588 [details] amdgpu journalctl logs With latest available kernel (6.5.2-1-default), in a totally random mode, during the use of Gnome and any applications (ie. Chrome or Thunderbird), the rendering breaks and from the journalctl logs I can see the errors which I share via the attachment file. The issue is *not* present in kernel 6.4.12-1-default which I am now using as a backup plan.
Hi, do you have the means to test certain patches? I found e77673d14f2c ("drm/amdgpu: Update invalid PTE flag setting") in v6.6-rc1 and edcfe22985d0 ("drm/amdkfd: Insert missing TLB flush on GFX10 and later") in v6.6-rc2. The former looks promising as a fix.
(In reply to Thomas Zimmermann from comment #1) > Hi, do you have the means to test certain patches? > > I found > > e77673d14f2c ("drm/amdgpu: Update invalid PTE flag setting") > > in v6.6-rc1 and > > edcfe22985d0 ("drm/amdkfd: Insert missing TLB flush on GFX10 and later") > > in v6.6-rc2. > > The former looks promising as a fix. The problem with testing is that this issue is happening on my work-laptop which I need up-and-running (for obvious reasons). :( An option I see is to wait for kernel 6.6 to land in factory and eventually TW, get it via "zypper dup" and see if that helps with the bug. Or hopefully the fix lands on some bug-fix release of 6.5? Would you be happy enough with that?
(In reply to Marco Varlese from comment #2) > (In reply to Thomas Zimmermann from comment #1) > > Hi, do you have the means to test certain patches? > > > > I found > > > > e77673d14f2c ("drm/amdgpu: Update invalid PTE flag setting") > > > > in v6.6-rc1 and > > > > edcfe22985d0 ("drm/amdkfd: Insert missing TLB flush on GFX10 and later") > > > > in v6.6-rc2. > > > > The former looks promising as a fix. > > The problem with testing is that this issue is happening on my work-laptop > which I need up-and-running (for obvious reasons). :( > > An option I see is to wait for kernel 6.6 to land in factory and eventually > TW, get it via "zypper dup" and see if that helps with the bug. Or hopefully > the fix lands on some bug-fix release of 6.5? > > Would you be happy enough with that? I haven't seen these patches in linux-stable (yet). I can attempt to backport them into TW. I'll also try to reproduce this locally.
> > e77673d14f2c ("drm/amdgpu: Update invalid PTE flag setting") > FYI I have sent this patch for inclusion in the stable branch.
*** Bug 1215695 has been marked as a duplicate of this bug. ***
Apparently kernel 6.5.5 (from build.o.o devel project) fixed it for me. ~> LANG=C sudo zypper info kernel-default ... Information for package kernel-default: --------------------------------------- Repository : Kernel builds for branch stable (standard) Name : kernel-default Version : 6.5.5-2.1.g6cf5261 Arch : x86_64 Vendor : obs://build.opensuse.org/Kernel Installed Size : 248.2 MiB Installed : Yes Status : up-to-date Source package : kernel-default-6.5.5-2.1.g6cf5261.nosrc Upstream URL : https://www.kernel.org/ Summary : The Standard Kernel Description : The standard kernel for both uniprocessor and multiprocessor systems. Source Timestamp: 2023-09-25 10:19:02 +0000 GIT Revision: 6cf5261da0ebc2ca4f200ee6fe0fde9d6c3eff3e GIT Branch: stable
I confirm - having run kernel 6.5.8-1-default for sometime now - that the bug is no longer there. I think we can close this bug as resolved. Thank you for looking into this and fixing it so promptly!