|
Bugzilla – Full Text Bug Listing |
| Summary: | GPU hang after kernel update to 6.5.4-1-default | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Gabriel Krisman Bertazi <gabriel.bertazi> |
| Component: | Kernel | Assignee: | openSUSE Kernel Bugs <kernel-bugs> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | gabriel.bertazi |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | kernel log | ||
|
Description
Gabriel Krisman Bertazi
2023-10-03 13:44:03 UTC
Ah, after checking the logs a bit more, I see a bunch of these, a few seconds before the hang. Maybe it is what caused the gpu reset? kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:5 pasid:32772, for process firefox pid 3410 thread firefox:cs0 pid 3489) kernel: amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x00003ffb78559000 from IH client 0x1b (UTCL2) kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00500431 kernel: amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2) kernel: amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1 kernel: amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0 kernel: amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3 kernel: amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0 kernel: amdgpu 0000:06:00.0: amdgpu: RW: 0x0 This comes from the firefox process, but I've seen it crash even when firefox wasn't involved. Created attachment 869880 [details]
kernel log
For completeness, you might see a few of kernel: WARNING: CPU: 11 PID: 3081 at drivers/acpi/platform_profile.c:74 platform_profile_show+0xa6/0xd0 [platform_profile] during boot in the log in Comment 2. Those have been there since installing TW and are most likely unrelated to this issue (should be another kernel or a fw bug report). But it seemed harmless and I forgot to investigate/report. Just as a wild guess: Could you boot with amdgpu.mcbp=0? No reply from the reporter so far. If amdgpu.mcbp=0 solves the issue, then kernel 6.5.6 has the fix: commit 2c4cc4d787a5f332f2c61f12cdb31e01da386439 Author: Jiadong Zhu <Jiadong.Zhu@amd.com> Date: Wed Jul 26 15:21:48 2023 +0800 drm/amdgpu: set completion status as preempted for the resubmission (In reply to Frank Krüger from comment #5) > No reply from the reporter so far. If amdgpu.mcbp=0 solves the issue, then > kernel 6.5.6 has the fix: > > commit 2c4cc4d787a5f332f2c61f12cdb31e01da386439 > Author: Jiadong Zhu <Jiadong.Zhu@amd.com> > Date: Wed Jul 26 15:21:48 2023 +0800 > > drm/amdgpu: set completion status as preempted for the resubmission Apologies. This is a workstation and I haven't been able to try it yet. Will try to do after hours today. Just tried 6.5.6-1-default and it fixed the issue. thanks. closing. |