Bug 1177428

Summary: AMDGPU resume fail
Product: [openSUSE] openSUSE Tumbleweed Reporter: Karl Mistelberger <karl.mistelberger>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED WORKSFORME QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: 2012gdwu, alexander.deucher, chris-hartmann, felipefm, fvogt, jslaby, karl.mistelberger, kaykaykay123, patrik.jakobsson, tiwai, tzimmermann
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Tumbleweed   
URL: https://gitlab.freedesktop.org/drm/amd/-/issues/1354
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: old boot
new boot
new boot
last good boot
fresh install of snapshot 20201007
snapshot 20201008 even worse
20201008 with firmware 20201007
still freezing
Picasso firmware files
dmesg output
journal suspend to RAM

Description Karl Mistelberger 2020-10-07 11:31:42 UTC
Resume fails with a frozen desktop:

[   24.398940] kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[   24.398978] kernel: [drm] PSP is resuming...
[   24.418828] kernel: [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[   24.427721] kernel: [drm] psp command (0x5) failed and response status is (0xFFFF0007)
[   24.873381] kernel: ata2.00: supports DRM functions and may not be fully accessible
[   24.873629] kernel: ata5.00: supports DRM functions and may not be fully accessible
[   24.875318] kernel: ata5.00: supports DRM functions and may not be fully accessible
[   24.875414] kernel: [drm] kiq ring mec 2 pipe 1 q 0
[   24.875815] kernel: ata2.00: supports DRM functions and may not be fully accessible
[   24.921903] kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
[   25.403984] kernel: [drm] Fence fallback timer expired on ring sdma0
[   25.436018] kernel: [drm] Fence fallback timer expired on ring gfx
[   25.436197] kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-22).
[   25.436204] kernel: [drm:process_one_work] *ERROR* ib ring test failed (-22).
[   34.512086] kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[   34.512139] kernel: [drm] PSP is resuming...
[   34.532000] kernel: [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[   34.541638] kernel: [drm] psp command (0x5) failed and response status is (0xFFFF0007)
[   34.986957] kernel: ata2.00: supports DRM functions and may not be fully accessible
[   34.989592] kernel: ata2.00: supports DRM functions and may not be fully accessible
[   34.990819] kernel: ata5.00: supports DRM functions and may not be fully accessible
[   34.992628] kernel: ata5.00: supports DRM functions and may not be fully accessible
[   35.013814] kernel: [drm] kiq ring mec 2 pipe 1 q 0
[   35.060526] kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
[   35.541618] kernel: [drm] Fence fallback timer expired on ring sdma0
[   36.085419] kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
[   36.085428] kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).


inxi -zaG:

Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel bus ID: 06:00.0 chip ID: 1002:15d8 
           Display: server: X.Org 1.20.9 compositor: kwin_x11 driver: amdgpu display ID: :0 screens: 1 
           Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.0x11.2") s-diag: 582mm (22.9") 
           Monitor-1: DVI-D-0 res: 1920x1080 hz: 60 dpi: 79 size: 621x341mm (24.4x13.4") diag: 708mm (27.9") 
           OpenGL: renderer: AMD RAVEN (DRM 3.38.0 5.8.12-1-default LLVM 10.0.1) v: 4.6 Mesa 20.1.8 direct render: Yes
Comment 1 Stefan Dirsch 2020-10-07 14:19:46 UTC
Hmm. Kernel issue.
Comment 2 Takashi Iwai 2020-10-07 14:33:42 UTC
Is this a regression?
Comment 3 Karl Mistelberger 2020-10-08 04:55:30 UTC
(In reply to Takashi Iwai from comment #2)
> Is this a regression?

I assembled the machine in July. IIRC suspend/resume worked then. Later I ran into trouble with graphics, which went away with further updating:

https://forums.opensuse.org/showthread.php/544219-Amdgpu-Trouble

drm is a great idea, but to me it seems it still has teething problems.
Comment 4 Takashi Iwai 2020-10-08 13:08:39 UTC
But how about this suspend/resume problem?  Is it a new issue that worked in the past version?
Comment 5 Karl Mistelberger 2020-10-08 15:50:46 UTC
(In reply to Takashi Iwai from comment #4)
> But how about this suspend/resume problem?  Is it a new issue that worked in
> the past version?

journal says:

....
Jul 21 08:14:20 localhost systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
....

Thus I presume suspend/resume worked because drm was no used. Now it gets loaded and fails.

BTW: How can I skip loading drm for the time being?
Comment 6 Takashi Iwai 2020-10-08 16:01:41 UTC
You *did* use amdgpu driver in the early usage, no?  The message might be a red herring.

So, please clarify the situation:

- Which Tumbleweed snapshot it worked in which configuration
- What exactly gets broken now: tell the procedure; how to suspend and how to 
resume?
- How is your configuration now

We really need to identify the difference if it worked somehow in the past.
Comment 7 Karl Mistelberger 2020-10-08 16:26:29 UTC
Created attachment 842420 [details]
old boot
Comment 8 Karl Mistelberger 2020-10-08 16:27:23 UTC
(In reply to Takashi Iwai from comment #6)
> You *did* use amdgpu driver in the early usage, no?  The message might be a
> red herring.

I am unsure. I did a plain install without further tinkering.

> Which Tumbleweed snapshot it worked in which configuration

Snapshots are gone, but Tumbleweed was kept up to date by running "zypper dup". See attached journal.

> - What exactly gets broken now: tell the procedure; how to suspend and how
> to resume?

From the journal:

[   89.306023] systemd[1]: Starting Suspend...
[   89.315625] systemd-sleep[2156]: INFO: Skip running /usr/lib/systemd/system-sleep/grub2.sleep for suspend
[   89.316552] systemd-sleep[2151]: Suspending system...
[  100.329743] systemd-sleep[2151]: System resumed.
[  100.333370] systemd-sleep[2245]: INFO: Skip running /usr/lib/systemd/system-sleep/grub2.sleep for suspend
[  100.334508] systemd[1]: systemd-suspend.service: Succeeded.
[  100.334894] systemd[1]: Finished Suspend.

> - How is your configuration now

Current system is:

Operating System: openSUSE Tumbleweed 20201005
KDE Plasma Version: 5.19.5
KDE Frameworks Version: 5.74.0
Qt Version: 5.15.1
Kernel Version: 5.8.12-1-default
OS Type: 64-bit
Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics
Memory: 29.3 GiB of RAM
Graphics Processor: AMD RAVEN

See also attached journal.
Comment 9 Karl Mistelberger 2020-10-08 16:30:56 UTC
Created attachment 842421 [details]
new boot
Comment 10 Takashi Iwai 2020-10-08 16:57:06 UTC
(In reply to Karl Mistelberger from comment #8)
> (In reply to Takashi Iwai from comment #6)
> > You *did* use amdgpu driver in the early usage, no?  The message might be a
> > red herring.
> 
> I am unsure. I did a plain install without further tinkering.

The log you attached showed the amdgpu drm driver being used, so it was deployed.

> > Which Tumbleweed snapshot it worked in which configuration
> 
> Snapshots are gone, but Tumbleweed was kept up to date by running "zypper
> dup". See attached journal.

So no configuration change in your side.

> > - What exactly gets broken now: tell the procedure; how to suspend and how
> > to resume?
> 
> From the journal:
> 
> [   89.306023] systemd[1]: Starting Suspend...
> [   89.315625] systemd-sleep[2156]: INFO: Skip running
> /usr/lib/systemd/system-sleep/grub2.sleep for suspend
> [   89.316552] systemd-sleep[2151]: Suspending system...
> [  100.329743] systemd-sleep[2151]: System resumed.
> [  100.333370] systemd-sleep[2245]: INFO: Skip running
> /usr/lib/systemd/system-sleep/grub2.sleep for suspend
> [  100.334508] systemd[1]: systemd-suspend.service: Succeeded.
> [  100.334894] systemd[1]: Finished Suspend.

Erm, it's not clear "how" you triggered the suspend and resumed.  By the lid close (if laptop), from KDE menu, or whatever?  Just to be sure.

Also, do I understand correct that you're dealing with the suspend-to-RAM, not the hibernation, right?

> > - How is your configuration now
> 
> Current system is:
> 
> Operating System: openSUSE Tumbleweed 20201005
> KDE Plasma Version: 5.19.5
> KDE Frameworks Version: 5.74.0
> Qt Version: 5.15.1
> Kernel Version: 5.8.12-1-default
> OS Type: 64-bit
> Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics
> Memory: 29.3 GiB of RAM
> Graphics Processor: AMD RAVEN
> 
> See also attached journal.

Thanks!

The old log showed that it was 5.7.x kernel.  Could you try to install the old kernel from OBS home:tiwai:5.7 repo, boot with it and test the suspend/resume?
  http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.7/standard/
Comment 11 Karl Mistelberger 2020-10-08 17:27:15 UTC
> > (In reply to Takashi Iwai from comment #6)
 
> Erm, it's not clear "how" you triggered the suspend and resumed.  By the lid
> close (if laptop), from KDE menu, or whatever?  Just to be sure.

KDE > Application Starter > Leave > Suspend

> Also, do I understand correct that you're dealing with the suspend-to-RAM,
> not the hibernation, right?

Suspend to RAM.

> The old log showed that it was 5.7.x kernel.  Could you try to install the
> old kernel from OBS home:tiwai:5.7 repo, boot with it and test the
> suspend/resume?
>  
> http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.7/standard/

Installed and got a freeze too. See attachment.
Comment 12 Karl Mistelberger 2020-10-08 17:28:07 UTC
Created attachment 842426 [details]
new boot
Comment 13 Takashi Iwai 2020-10-08 17:36:05 UTC
OK, thanks.
Then could you try to crawl through the old journal and check which kernel version started showing the problem?  The one you showed with 5.7.7 and the lastly tested was 5.7.12.  There might be something between them you've tested.
Comment 14 Takashi Iwai 2020-10-08 17:42:12 UTC
Also, please check whether "amd_iommu=off" boot option makes any improvement wrt this bug.
Comment 15 Karl Mistelberger 2020-10-08 18:01:58 UTC
(In reply to Takashi Iwai from comment #13)
> OK, thanks.
> Then could you try to crawl through the old journal and check which kernel
> version started showing the problem?  The one you showed with 5.7.7 and the
> lastly tested was 5.7.12.  There might be something between them you've
> tested.

Last good resume from journal is with 8.4-1-default.
Comment 16 Karl Mistelberger 2020-10-08 18:03:03 UTC
(In reply to Takashi Iwai from comment #14)
> Also, please check whether "amd_iommu=off" boot option makes any improvement
> wrt this bug.

Booted with amd_iommu=off and got a freeze too.
Comment 17 Karl Mistelberger 2020-10-08 18:14:41 UTC
Created attachment 842428 [details]
last good boot
Comment 18 Takashi Iwai 2020-10-09 13:10:12 UTC
Hrm, 5.8-series still worked while the test with 5.7.12 failed now.  This sounds rather like some change outside the kernel in the past triggered the problem?
Comment 19 Takashi Iwai 2020-10-09 13:21:05 UTC
Ales, does this ring your bell?
It's Tumbleweed, so everything should be fairly close to the latest upstream.
Comment 20 Karl Mistelberger 2020-10-09 18:55:52 UTC
Created attachment 842483 [details]
fresh install of snapshot 20201007

yet another error messages during suspend/resume
Comment 21 Karl Mistelberger 2020-10-10 05:05:52 UTC
Created attachment 842489 [details]
snapshot 20201008 even worse

really bad experience; needed to roll back to 20201007 (which worked perfectly).
Comment 22 Takashi Iwai 2020-10-10 07:49:17 UTC
Hm, on 20201008, the amdgpu driver goes south even at a fresh boot time.
I noticed that this new TW snapshot includes the update of kernel-firmware package.  Could you try to downgrade kernel-firmware package from 20201008 state to the version in 202017?  You can find the old package in OBS history repo,
  http://download.opensuse.org/history/
Comment 23 Takashi Iwai 2020-10-10 07:58:31 UTC
BTW you need to update only amdgpu firmware:
% zypper in --oldpackage --force http://download.opensuse.org/history/20201007/tumbleweed/repo/oss/noarch/kernel-firmware-amdgpu-20200916-1.1.noarch.rpm
Comment 24 Karl Mistelberger 2020-10-10 09:50:45 UTC
Created attachment 842491 [details]
20201008 with firmware 20201007

Booted into  20201007, locked all firmed and duped to 20201008. System boots, but again freezes on resume.
Comment 25 Takashi Iwai 2020-10-10 10:33:06 UTC
OK, then the latest crash at boot was indeed a regression of amdgpu firmware.

Just to make sure: the oldest kernel-firmware found in OBS history repo is:
  http://download.opensuse.org/history/20200907/tumbleweed/repo/oss/noarch/kernel-firmware-amdgpu-20200807-1.2.noarch.rpm

Could you try to downgrade to this version and retest?

Also, please try to boot with "firmware_class.dyndbg=+p" boot option, and give the dmesg output.  This will contain the debug prints showing which firmware files are loaded.
Comment 26 Karl Mistelberger 2020-10-10 11:58:54 UTC
Created attachment 842492 [details]
still freezing
Comment 27 Christian Hartmann 2020-10-10 21:46:47 UTC
I might have a similar issue... After doing a zypper dup today I cannot boot my computer.

In the rescue system I was able to see some errors in journalctl regarding amdgpu.

(In reply to Takashi Iwai from comment #23)
> BTW you need to update only amdgpu firmware:
> % zypper in --oldpackage --force
> http://download.opensuse.org/history/20201007/tumbleweed/repo/oss/noarch/
> kernel-firmware-amdgpu-20200916-1.1.noarch.rpm

Downgrading the kernel-firmware-amdgpu package fixed the issue.
Comment 28 Takashi Iwai 2020-10-11 08:05:42 UTC
(In reply to Christian Hartmann from comment #27)
> Downgrading the kernel-firmware-amdgpu package fixed the issue.

Could you give dmesg output with "firmware_class.dyndbg=+p" boot option, too?
We need to check which firmware is involved.

In the case of Karl, it was amdgpu/picasso*.
Comment 29 Takashi Iwai 2020-10-11 08:21:29 UTC
(In reply to Karl Mistelberger from comment #26)
> Created attachment 842492 [details]
> still freezing

Thanks.  It's Picasso board, and this was already a problem in the past, hence we shipped the older firmware as a workaround.
At the latest kernel-firmware update, we removed the workaround as I was informed that the issue should have been fixed, but apparently it's not fixed.  So I'm going to put the old firmware again.

However, the question is which old one; I'd really like to see whether the original issue (the GPU error at resume) comes from the firmware or not.

Now I uploaded various versions of picasso firmware files taken from linux-firmware.git.  The tarball contains subdirectory for each version (e.g. 19.50, 20.10, ...).  For testing it, try the following:

- Create /lib/firmware/updates/amdgpu directory:
  % mkdir -p /lib/firmware/updates/amdgpu

- Copy the contents of the firmware version you want to test (e.g. 19.50):
  % cp 19.50/amdgpu/picasso* /lib/firmware/updates/amdgpu/

- Rebuild initrd and retest:
  % mkinitrd
  % reboot

The version 20.40 is the same one as the latest kernel-firmware package, hence this is supposed to be broken.  I included it to be sure.

Please check each version and let me know the behavior.  Thanks!
Comment 30 Takashi Iwai 2020-10-11 08:22:02 UTC
Created attachment 842495 [details]
Picasso firmware files
Comment 31 Karl Mistelberger 2020-10-11 09:04:27 UTC
(In reply to Takashi Iwai from comment #29)

> Please check each version and let me know the behavior.  Thanks!

Tested all of them with 5.8.14-1-default and none of them works. Just to be sure:

hofkirchen:~ # journalctl -b 0 --no-h  --grep amdgpu|grep Loading
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_gpu_info.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_sdma.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_asd.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_ta.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_pfp.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_me.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_ce.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_rlc_am4.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_mec.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_mec2.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/raven_dmcu.bin
Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_vcn.bin
Comment 32 Takashi Iwai 2020-10-11 09:59:15 UTC
Thank you for quick testing.  This concluded that the resume problem is no regression of the recent firmware files, at least.

Also, could you tell which firmware version did boot properly?  I suppose 20.40 caused the same problem?

BTW, if you get stuck at boot due to the graphics problem, you can boot once with "nomodeset" boot option, and fix/revert things.  The option will disable the native DRM driver.  For reverting the manual firmware override, simply remove files/directories /lib/firmware/update/* and rebuild initrd (call mkinitrd).
Comment 33 Karl Mistelberger 2020-10-11 10:40:59 UTC
(In reply to Takashi Iwai from comment #32)
> Thank you for quick testing.  This concluded that the resume problem is no
> regression of the recent firmware files, at least.
> 
> Also, could you tell which firmware version did boot properly?  I suppose
> 20.40 caused the same problem?

Never assume anything. All of the five booted correctly and all of them freezed upon resume.
Comment 34 Takashi Iwai 2020-10-11 13:17:25 UTC
Hmm.  The latest TW kernel-firmware package contains 20.40, so this should have triggered the same problem.

Could you confirm the following?
- Remove /lib/firmware/updates/*
- Install again the latest TW kernel-firmware (20201008)
- mkinitrd and reboot

If this shows the boot problem, try to put picasso 20.40 firmware again /lib/firmware/updates, mkinitrd and retest.
Comment 35 Karl Mistelberger 2020-10-11 13:33:33 UTC
(In reply to Takashi Iwai from comment #34)
> Hmm.  The latest TW kernel-firmware package contains 20.40, so this should
> have triggered the same problem.
> 
> Could you confirm the following?
> - Remove /lib/firmware/updates/*
> - Install again the latest TW kernel-firmware (20201008)
> - mkinitrd and reboot

Boot fails.
 
> If this shows the boot problem, try to put picasso 20.40 firmware again
> /lib/firmware/updates, mkinitrd and retest.

Boot works. :-)
Comment 36 Christian Hartmann 2020-10-11 17:42:16 UTC
Created attachment 842496 [details]
dmesg output

(In reply to Takashi Iwai from comment #28)
> (In reply to Christian Hartmann from comment #27)
> > Downgrading the kernel-firmware-amdgpu package fixed the issue.
> 
> Could you give dmesg output with "firmware_class.dyndbg=+p" boot option, too?
> We need to check which firmware is involved.
> 
> In the case of Karl, it was amdgpu/picasso*.

I've uploaded my dmesg output... And yes, it also looks like picasso...
Comment 37 Takashi Iwai 2020-10-11 17:53:58 UTC
Weird...  Could you try the following?

- Remove /lib/firmware/updates again, mkinitrd, and confirm that you get the unbootable state again

- Boot with nomodeset, then run
  % unxz -f /lib/firmware/amdgpu/picasso*.xz
  mkinitrd, reboot and check whether it works now
(In reply to Karl Mistelberger from comment #35)
> (In reply to Takashi Iwai from comment #34)
> > Hmm.  The latest TW kernel-firmware package contains 20.40, so this should
> > have triggered the same problem.
> > 
> > Could you confirm the following?
> > - Remove /lib/firmware/updates/*
> > - Install again the latest TW kernel-firmware (20201008)
> > - mkinitrd and reboot
> 
> Boot fails.
>  
> > If this shows the boot problem, try to put picasso 20.40 firmware again
> > /lib/firmware/updates, mkinitrd and retest.
> 
> Boot works. :-)

Weird...  Could you try the following?

- Remove /lib/firmware/updates again, mkinitrd, and confirm that you get the unbootable state again

- Boot with nomodeset, then run
  % unxz -f /lib/firmware/amdgpu/picasso*.xz
  mkinitrd, reboot and check whether it works now
Comment 38 Takashi Iwai 2020-10-11 17:55:32 UTC
(In reply to Christian Hartmann from comment #36)
> I've uploaded my dmesg output... And yes, it also looks like picasso...

OK, then could you also check the test in comment 29?
Comment 39 Karl Mistelberger 2020-10-11 18:15:16 UTC
(In reply to Takashi Iwai from comment #37)
> Weird...  Could you try the following?
> 
> - Remove /lib/firmware/updates again, mkinitrd, and confirm that you get the
> unbootable state again
> 
> - Boot with nomodeset, then run
>   % unxz -f /lib/firmware/amdgpu/picasso*.xz
>   mkinitrd, reboot and check whether it works now

Tried that:

Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_gpu_info.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_sdma.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_asd.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_ta.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_pfp.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_me.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_ce.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_rlc_am4.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_mec.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_mec2.bin
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/raven_dmcu.bin.xz
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_vcn.bin

Uncompressed, ran mkinitrd and rebooted. Boot failed:

Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Oct 11 20:00:30 kernel: [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v9_0> failed -110
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
Oct 11 20:00:30 kernel: kvm: disabled by bios
Oct 11 20:00:30 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Oct 11 20:00:30 kernel: #PF: supervisor read access in kernel mode
Oct 11 20:00:30 kernel: #PF: error_code(0x0000) - not-present page
Oct 11 20:00:30 systemd-udevd[513]: 0000:06:00.0: Worker [537] failed
Oct 11 20:00:30 kernel: kvm: disabled by bios
Oct 11 20:00:31 sddm[931]: Failed to read display number from pipe
Oct 11 20:00:31 sddm[931]: Display server failed to start. Exiting
Oct 11 20:00:31 kernel: BUG: unable to handle page fault for address: 0000000000016928
Oct 11 20:00:31 kernel: #PF: supervisor read access in kernel mode
Oct 11 20:00:31 kernel: #PF: error_code(0x0000) - not-present page
Oct 11 20:00:31 systemd[1]: Failed to start X Display Manager.

Recovered by forced install of kernel-firmware-amdgpu and copying 20.40/amdgpu/picasso* to /lib/firmware/updates/amdgpu/, mkinitrd and reboot.
Comment 40 Karl Mistelberger 2020-10-11 18:19:57 UTC
Manjaro Version 5.8.11-1-MANJARO + amd-ucode.img works fine for suspend/resume
Comment 41 Christian Hartmann 2020-10-11 18:25:34 UTC
(In reply to Takashi Iwai from comment #38)
> (In reply to Christian Hartmann from comment #36)
> > I've uploaded my dmesg output... And yes, it also looks like picasso...
> 
> OK, then could you also check the test in comment 29?

Just did that... I first tried 20.30 and after that 20.40. Both times the system booted just fine. (In reply to Takashi Iwai from comment #38)
> (In reply to Christian Hartmann from comment #36)
> > I've uploaded my dmesg output... And yes, it also looks like picasso...
> 
> OK, then could you also check the test in comment 29?

I just did that... I first tried with 20.30 and after that with 20.40. Both times the system booted just fine.
Comment 42 Zbigniew Luszpinski 2020-10-11 22:36:50 UTC
Same issue here: freeze on boot after latest zypper dup on Tumbleweed at line:
fb0: switching to amdgpudrmfb from EFI VGA
and cpu fan goes on max rpm.
HP Pavilion Laptop 15-cw1xxx
[AMD/ATI] Picasso [1002:15d8] (rev c3)
ATOM BIOS: 113-PICASSO-114

Solution works:
boot with nomodesetting parameter to reach desktop and use commands to downgrade broken package:
zypper in --oldpackage --force http://download.opensuse.org/history/20201007/tumbleweed/repo/oss/noarch/kernel-firmware-amdgpu-20200916-1.1.noarch.rpm
choose solution 3: Solution 3: break kernel-firmware-amdgpu by ignoring some of its dependencies

There is thread on suse forum regrading this issue:
https://forums.opensuse.org/showthread.php/545773-AMDGPU-failure-after-zypper-dup-today

greetings,
Zbigniew
Comment 43 Takashi Iwai 2020-10-12 06:30:50 UTC
Now can people wit Picasso board test the kenrel-firmware-amdgpu package in OBS Kernel:HEAD repo?  I didn't revert the amdgpu there yet, but there was some error about the split / copy script and it was fixed at first.
  http://download.opensuse.org/repositories/Kernel:/HEAD/standard/noarch/kernel-firmware-amdgpu-20201005-334.1.noarch.rpm

Please make sure that you clear /lib/firmware/updates/* and uncompressed /lib/firmware/amdgpu/picasso* files beforehand.

If the boot still fails with this version, let's check again the following:

- Uncompress /lib/firmware/amdgpu/picasso*.xz, and retest
  unxz -f /lib/firmware/amdgpu/picasso*.xz
  mkinitrd
  reboot

If it still fails,
- Compare the contents of those uncompressed picasso* firmware files with the 20.40 version of the tarball in comment 30.  All those must be identical.
Comment 44 Karl Mistelberger 2020-10-12 07:14:50 UTC
(In reply to Takashi Iwai from comment #43)
> Now can people wit Picasso board test the kenrel-firmware-amdgpu package in
> OBS Kernel:HEAD repo?  I didn't revert the amdgpu there yet, but there was
> some error about the split / copy script and it was fixed at first.
>  
> http://download.opensuse.org/repositories/Kernel:/HEAD/standard/noarch/
> kernel-firmware-amdgpu-20201005-334.1.noarch.rpm
> 
> Please make sure that you clear /lib/firmware/updates/* and uncompressed
> /lib/firmware/amdgpu/picasso* files beforehand.

Still fails.

> 
> If the boot still fails with this version, let's check again the following:
> 
> - Uncompress /lib/firmware/amdgpu/picasso*.xz, and retest
>   unxz -f /lib/firmware/amdgpu/picasso*.xz
>   mkinitrd
>   reboot

:~ # unxz -f /lib/firmware/amdgpu/picasso*.xz
unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_ce.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_gpu_info.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_me.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_mec.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_mec2.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_pfp.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_rlc.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_sdma.bin.xz: File format not recognized
unxz: /lib/firmware/amdgpu/picasso_vcn.bin.xz: File format not recognized
:~ #
Comment 45 Takashi Iwai 2020-10-12 08:00:45 UTC
(In reply to Karl Mistelberger from comment #44)
> :~ # unxz -f /lib/firmware/amdgpu/picasso*.xz
> unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized

What shows the output of below?
  file /lib/firmware/amdgpu/picasso_asd.bin.xz
Comment 46 Karl Mistelberger 2020-10-12 08:03:16 UTC
(In reply to Takashi Iwai from comment #45)
> (In reply to Karl Mistelberger from comment #44)
> > :~ # unxz -f /lib/firmware/amdgpu/picasso*.xz
> > unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized
> 
> What shows the output of below?
>   file /lib/firmware/amdgpu/picasso_asd.bin.xz

/lib/firmware/amdgpu/picasso_asd.bin.xz: empty


:~ # ll /lib/firmware/amdgpu/picasso*.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_asd.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_ce.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_gpu_info.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_me.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_mec.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_mec2.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_pfp.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_rlc.bin.xz
-rw-r--r--  1 root root 9160 Oct  8 13:23 /lib/firmware/amdgpu/picasso_rlc_am4.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_sdma.bin.xz
-rw-r--r--  1 root root 9548 Oct  8 13:23 /lib/firmware/amdgpu/picasso_ta.bin.xz
--w------- 11 root root    0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_vcn.bin.xz
Comment 47 Takashi Iwai 2020-10-12 08:05:51 UTC
Hm.  Something wrong with the installation.  Could you retry?  At best:

% rm -rf /lib/firmware/amdgpu
% zypper rm -u --nodeps kernel-firmware-amdgpu
% zypper in --oldpackage -f kernel-firmware-amdgpu-XXX.rpm
  (where XXX is filled with the actual package rpm you've downloaded.)

Then check the file command again.
Comment 48 Karl Mistelberger 2020-10-12 08:11:52 UTC
(In reply to Takashi Iwai from comment #47)
> Hm.  Something wrong with the installation.  Could you retry?  At best:
> 
> % rm -rf /lib/firmware/amdgpu
> % zypper rm -u --nodeps kernel-firmware-amdgpu
> % zypper in --oldpackage -f kernel-firmware-amdgpu-XXX.rpm
>   (where XXX is filled with the actual package rpm you've downloaded.)
> 
> Then check the file command again.

:~ # ll /lib/firmware/amdgpu/picasso*.xz
-rw-r--r-- 3 root root  31836 Oct 11 23:38 /lib/firmware/amdgpu/picasso_asd.bin.xz
-rw-r--r-- 2 root root   3156 Oct 11 23:38 /lib/firmware/amdgpu/picasso_ce.bin.xz
-rw-r--r-- 2 root root    112 Oct 11 23:38 /lib/firmware/amdgpu/picasso_gpu_info.bin.xz
-rw-r--r-- 2 root root   6104 Oct 11 23:38 /lib/firmware/amdgpu/picasso_me.bin.xz
-rw-r--r-- 4 root root  26048 Oct 11 23:38 /lib/firmware/amdgpu/picasso_mec.bin.xz
-rw-r--r-- 4 root root  26048 Oct 11 23:38 /lib/firmware/amdgpu/picasso_mec2.bin.xz
-rw-r--r-- 2 root root   8312 Oct 11 23:38 /lib/firmware/amdgpu/picasso_pfp.bin.xz
-rw-r--r-- 2 root root   9292 Oct 11 23:38 /lib/firmware/amdgpu/picasso_rlc.bin.xz
-rw-r--r-- 1 root root   9160 Oct 11 23:38 /lib/firmware/amdgpu/picasso_rlc_am4.bin.xz
-rw-r--r-- 2 root root   7360 Oct 11 23:38 /lib/firmware/amdgpu/picasso_sdma.bin.xz
-rw-r--r-- 1 root root   9548 Oct 11 23:38 /lib/firmware/amdgpu/picasso_ta.bin.xz
-rw-r--r-- 3 root root 219540 Oct 11 23:38 /lib/firmware/amdgpu/picasso_vcn.bin.xz

:-)
Comment 49 Takashi Iwai 2020-10-12 08:33:51 UTC
OK, now one step forward.  And it still fails to boot?
Comment 50 Karl Mistelberger 2020-10-12 08:44:20 UTC
(In reply to Takashi Iwai from comment #49)
> OK, now one step forward.  And it still fails to boot?

I boots, however suspend/resume still freezes:

Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Oct 12 10:38:44 kernel: [drm] Fence fallback timer expired on ring sdma0
Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
Oct 12 10:38:44 kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).
Comment 51 Takashi Iwai 2020-10-12 08:55:46 UTC
(In reply to Karl Mistelberger from comment #50)
> (In reply to Takashi Iwai from comment #49)
> > OK, now one step forward.  And it still fails to boot?
> 
> I boots, however suspend/resume still freezes:

OK, that's more or less expected from previous results.  We couldn't hit two birds in a shot :)

Now the question is why the boot failure happened with the recent kernel-firmware package.  It might be that hard-link contents in the package got broken via update.  Let's wait for the test results from other people, too.

I prepared another kernel-firmware packages in OBS home:tiwai:test:fw-fix repo.  It uses fdupes with -s option to make the duped file symlinks instead of hard-links, as it might work better.
Please test just a simple update of kernel-firmware-amdgpu package from:
  http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm

It's just to confirm that switching to symlink doesn't break things.
Comment 52 Christian Hartmann 2020-10-12 09:08:52 UTC
I will do a test in the evening after work :-)
Comment 53 Karl Mistelberger 2020-10-12 09:29:21 UTC
(In reply to Takashi Iwai from comment #51)
> Please test just a simple update of kernel-firmware-amdgpu package from:
>  
> http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/
> openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm

I reverted to broken firmware first by running zypper dup. Then I ran:

The following package is going to be upgraded:
  kernel-firmware-amdgpu

The following package is going to change vendor:
kernel-firmware-amdgpu
  openSUSE -> obs://build.opensuse.org/home:tiwai

1 package to upgrade, 1 to change vendor.
Overall download size: 7.6 MiB. Already cached: 0 B. No additional space will be used or freed after the operation.
Continue? [y/n/v/...? shows all options] (y): Retrieving package kernel-firmware-amdgpu-20201005-336.1.noarch (1/1),   7.6 MiB (  7.6 MiB unpacked)

Boot now works.
Comment 54 Takashi Iwai 2020-10-12 09:35:36 UTC
(In reply to Karl Mistelberger from comment #53)
> (In reply to Takashi Iwai from comment #51)
> > Please test just a simple update of kernel-firmware-amdgpu package from:
> >  
> > http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/
> > openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm
> 
> I reverted to broken firmware first by running zypper dup.

Actually I wonder whether the boot still fails at this moment after zypper dup.
If the problem was some hard-link mess via the update, it might work magically even without fixing anything else, just because you once uninstalled and cleaned up.

> Then I ran:
> 
> The following package is going to be upgraded:
>   kernel-firmware-amdgpu
> 
> The following package is going to change vendor:
> kernel-firmware-amdgpu
>   openSUSE -> obs://build.opensuse.org/home:tiwai
> 
> 1 package to upgrade, 1 to change vendor.
> Overall download size: 7.6 MiB. Already cached: 0 B. No additional space
> will be used or freed after the operation.
> Continue? [y/n/v/...? shows all options] (y): Retrieving package
> kernel-firmware-amdgpu-20201005-336.1.noarch (1/1),   7.6 MiB (  7.6 MiB
> unpacked)
> 
> Boot now works.

Thanks for confirmation.  I guess using symlink is a better option in anyway, so let's move toward this.
Comment 55 Karl Mistelberger 2020-10-12 09:44:25 UTC
(In reply to Takashi Iwai from comment #54)
> Actually I wonder whether the boot still fails at this moment after zypper
> dup.
> If the problem was some hard-link mess via the update, it might work
> magically even without fixing anything else, just because you once
> uninstalled and cleaned up.

Reverted from obs://build.opensuse.org/home:tiwai to openSUSE. Boot works now.
Comment 56 Christian Hartmann 2020-10-12 09:50:09 UTC
(In reply to Takashi Iwai from comment #54)
> (In reply to Karl Mistelberger from comment #53)
> > (In reply to Takashi Iwai from comment #51)
> > > Please test just a simple update of kernel-firmware-amdgpu package from:
> > >  
> > > http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/
> > > openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm
> > 
> > I reverted to broken firmware first by running zypper dup.
> 
> Actually I wonder whether the boot still fails at this moment after zypper
> dup.
> If the problem was some hard-link mess via the update, it might work
> magically even without fixing anything else, just because you once
> uninstalled and cleaned up.

If I'm not wrong, this would also explain the behaviour I faced when trying the old firmware releases and switching back to 20.40 still worked fine.
Comment 57 Christian Hartmann 2020-10-12 17:52:25 UTC
So, I've just checked going back to the official released version and boot fails.

(In reply to Takashi Iwai from comment #51)
> I prepared another kernel-firmware packages in OBS home:tiwai:test:fw-fix
> repo.  It uses fdupes with -s option to make the duped file symlinks instead
> of hard-links, as it might work better.
> Please test just a simple update of kernel-firmware-amdgpu package from:
>  
> http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/
> openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm
> 
> It's just to confirm that switching to symlink doesn't break things.

Usingt this version I was able to boot.
Comment 58 Takashi Iwai 2020-10-12 18:02:44 UTC
(In reply to Christian Hartmann from comment #57)
> So, I've just checked going back to the official released version and boot
> fails.

OK.  Did you uninstall kernel-firmware-amdgpu package once?  Or only upgrade/downgrade the package?  If it's the latter case, try the following:

- Uninstall kernel-firmware-amdgpu once
  % zypper rm -u kernel-firmware-amdgpu

- Remove stale files in /lib/firmware/amdgpu, if any
  % rm -f /lib/firmware/amdgpu

- Install the kernel-firmware-amdgpu package again from TW
  % zypper in kernel-firmware-amdgpu-20201005
  (pass some option to specify the repo or specify the proper rpm release number to get the TW package.)

I guess this would make it working for yours, too.

In anyway, it seems that the symlink version of fdupes works better, and I'm going to submit it now.
Comment 59 Christian Hartmann 2020-10-12 18:29:44 UTC
(In reply to Takashi Iwai from comment #58)
> (In reply to Christian Hartmann from comment #57)
> > So, I've just checked going back to the official released version and boot
> > fails.
> 
> OK.  Did you uninstall kernel-firmware-amdgpu package once?  Or only
> upgrade/downgrade the package?  If it's the latter case, try the following:
> 
> - Uninstall kernel-firmware-amdgpu once
>   % zypper rm -u kernel-firmware-amdgpu
> 
> - Remove stale files in /lib/firmware/amdgpu, if any
>   % rm -f /lib/firmware/amdgpu
> 
> - Install the kernel-firmware-amdgpu package again from TW
>   % zypper in kernel-firmware-amdgpu-20201005
>   (pass some option to specify the repo or specify the proper rpm release
> number to get the TW package.)
> 
> I guess this would make it working for yours, too.
> 
> In anyway, it seems that the symlink version of fdupes works better, and I'm
> going to submit it now.

Yes, after uninstalling and deleting the files my system boots normally with the version from the official repo
Comment 60 OBSbugzilla Bot 2020-10-12 18:40:07 UTC
This is an autogenerated message for OBS integration:
This bug (1177428) was mentioned in
https://build.opensuse.org/request/show/841342 Factory / kernel-firmware
Comment 61 Zbigniew Luszpinski 2020-10-13 23:28:52 UTC
I confirm. For me after uninstalling and deleting /lib/firmware/amdgpu and reinstalling official package my system boots normally with the version from the official repo.

Previously I reinstalled this package few times and done mkinitrd -f but this not helped so removing /lib/firmware/amdgpu may be crucial before reinstalling official packageas as I have never done this before Takashi said to do it.

Hint: maybe preinstall script should do cleanup task: rm -rf /lib/firmware/amdgpu ? This is second time I had to cleanup and reinstall amdgpu firmware package so problem returns sometimes.
Comment 62 Takashi Iwai 2020-10-14 07:55:29 UTC
Cleaning the directory can be dangerous at upgrading, so I'd like to avoid the hackish way.

The hardlink was established by fdupes call for the duplicated files, and this seems causing problems.  As mentioned in the above, now I changed the call with -s option to use symlink instead of hardlink, and this must work around the pitfall.  The fixed package is on its way to TW.

So, let's go back to the original issue, the resume failure.  If the boot hang still happens with the next update package, please open another bug and track there.  Thanks.
Comment 63 Fabian Vogt 2020-10-15 08:45:26 UTC
(In reply to Takashi Iwai from comment #62)
> Cleaning the directory can be dangerous at upgrading, so I'd like to avoid
> the hackish way.
> 
> The hardlink was established by fdupes call for the duplicated files, and
> this seems causing problems.  As mentioned in the above, now I changed the
> call with -s option to use symlink instead of hardlink, and this must work
> around the pitfall.  The fixed package is on its way to TW.

This looks very much like bug 1175025. This is fixed in RPM 4.16, but breaks too many package builds so won't be checked in soon. I'll try to convince mls to backport instead.
Comment 64 OBSbugzilla Bot 2020-10-19 10:30:14 UTC
This is an autogenerated message for OBS integration:
This bug (1177428) was mentioned in
https://build.opensuse.org/request/show/842515 Factory / rpm
Comment 65 Karl Mistelberger 2020-10-19 10:40:48 UTC
Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.
Comment 66 Takashi Iwai 2020-10-19 10:43:28 UTC
(In reply to Karl Mistelberger from comment #65)
> Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.

Did you test the kernel with your current openSUSE system?
Also what about more recent kernels?
Comment 67 Karl Mistelberger 2020-10-19 10:48:04 UTC
(In reply to Takashi Iwai from comment #66)
> (In reply to Karl Mistelberger from comment #65)
> > Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.
> 
> Did you test the kernel with your current openSUSE system?
> Also what about more recent kernels?

Operating System: openSUSE Tumbleweed 20201014
KDE Plasma Version: 5.20.0
KDE Frameworks Version: 5.75.0
Qt Version: 5.15.1
Kernel Version: 5.9.1-1.g8abc535-default
OS Type: 64-bit
Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics
Memory: 29.3 GiB of RAM
Graphics Processor: AMD RAVEN

kernel-stable freezes too:

3400G:~ # zypper se -is kernel-default
Loading repository data...
Reading installed packages...

S  | Name           | Type    | Version             | Arch   | Repository
---+----------------+---------+---------------------+--------+------------------
i+ | kernel-default | package | 5.8.15-1.1.gc680e93 | x86_64 | (System Packages)
i+ | kernel-default | package | 5.9.1-1.1.g8abc535  | x86_64 | kernel-stable
3400G:~ #
Comment 68 Takashi Iwai 2020-10-19 10:55:41 UTC
(In reply to Karl Mistelberger from comment #67)
> (In reply to Takashi Iwai from comment #66)
> > (In reply to Karl Mistelberger from comment #65)
> > > Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.
> > 
> > Did you test the kernel with your current openSUSE system?
> > Also what about more recent kernels?
> 
> Operating System: openSUSE Tumbleweed 20201014
> KDE Plasma Version: 5.20.0
> KDE Frameworks Version: 5.75.0
> Qt Version: 5.15.1
> Kernel Version: 5.9.1-1.g8abc535-default

I meant the recent *Fedora* kernel.  They must ship the newer version than the tad old 5.6.y.  And I don't know yet what you exactly tested with Fedora kernel...

> OS Type: 64-bit
> Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics
> Memory: 29.3 GiB of RAM
> Graphics Processor: AMD RAVEN
> 
> kernel-stable freezes too:

Do you mean the freeze at boot, or freeze after resume?
Comment 69 Karl Mistelberger 2020-10-19 11:06:56 UTC
(In reply to Takashi Iwai from comment #68)
> (In reply to Karl Mistelberger from comment #67)
> > (In reply to Takashi Iwai from comment #66)
> > > (In reply to Karl Mistelberger from comment #65)
> > > > Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well.
> > > 
> > > Did you test the kernel with your current openSUSE system?
> > > Also what about more recent kernels?
> > 
> > Operating System: openSUSE Tumbleweed 20201014
> > KDE Plasma Version: 5.20.0
> > KDE Frameworks Version: 5.75.0
> > Qt Version: 5.15.1
> > Kernel Version: 5.9.1-1.g8abc535-default
> 
> I meant the recent *Fedora* kernel.  They must ship the newer version than
> the tad old 5.6.y.  And I don't know yet what you exactly tested with Fedora
> kernel...

I tested suspend/resume with Fedora, Manjaro and openSUSE here:

3400G:~ # inxi -SMCG
System:    Host: 3400G Kernel: 5.9.1-1.g8abc535-default x86_64 bits: 64 Console: tty 2 Distro: openSUSE Tumbleweed 20201014 
Machine:   Type: Desktop Mobo: Gigabyte model: B450 AORUS ELITE v: x.x serial: N/A UEFI: American Megatrends v: F51 
           date: 12/18/2019 
CPU:       Topology: Quad Core model: AMD Ryzen 5 3400G with Radeon Vega Graphics bits: 64 type: MT MCP L2 cache: 2048 KiB 
           Speed: 1291 MHz min/max: 1400/3700 MHz Core speeds (MHz): 1: 1361 2: 1328 3: 1300 4: 1309 5: 1258 6: 1368 7: 1302 
           8: 1342 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel 
           Display: server: X.Org 1.20.9 driver: amdgpu FAILED: ati unloaded: fbdev,modesetting,vesa 
           resolution: 1920x1080~60Hz 
           OpenGL: renderer: AMD RAVEN (DRM 3.39.0 5.9.1-1.g8abc535-default LLVM 10.0.1) v: 4.6 Mesa 20.1.8 
3400G:~ # 

> Do you mean the freeze at boot, or freeze after resume?

Freeze after suspend/resume with openSUSE. No issues with Fedora and Manjaro. I will try to test a newer Fedora kernel.

Tested openSUSE: 5.8.14-1.2, 5.8.15-1.1.gc680e93, 5.9.1-1.1.g8abc535
Comment 70 Takashi Iwai 2020-10-19 11:31:32 UTC
OK, thanks.  I asked it because it's possibly a different user-space thing triggering the bug, not the kernel itself.

If any, we may try to build a kernel with the same config from other distros that work, and confirm whether it still works or not.
Comment 71 Karl Mistelberger 2020-10-19 11:38:06 UTC
Fedora vmlinuz-5.8.15-201.fc32.x86_64 happily resumes from suspend.
Comment 72 Takashi Iwai 2020-10-19 11:41:46 UTC
(In reply to Karl Mistelberger from comment #71)
> Fedora vmlinuz-5.8.15-201.fc32.x86_64 happily resumes from suspend.

I'm not sure whether you can deploy the Fedora kernel package onto openSUSE system, but it might be worth to try.  Just install it via "rpm -ivh xxx.rpm --nodeps", run "mkinitrd" manually, and see whether it boots up.
Comment 73 Karl Mistelberger 2020-10-19 12:04:02 UTC
(In reply to Takashi Iwai from comment #72)
> (In reply to Karl Mistelberger from comment #71)
> > Fedora vmlinuz-5.8.15-201.fc32.x86_64 happily resumes from suspend.
> 
> I'm not sure whether you can deploy the Fedora kernel package onto openSUSE
> system, but it might be worth to try.  Just install it via "rpm -ivh xxx.rpm
> --nodeps", run "mkinitrd" manually, and see whether it boots up.

I ran  curl https://repos.fedorapeople.org/repos/thl/kernel-vanilla.repo > /etc/zypp/repos.d/kernel-vanilla.repo

However I am lost with:

3400G:~ # zypper se -s kernel-vanilla
Error building the cache:
[kernel-vanilla-mainline|http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201014/x86_64/] Valid metadata not found at specified URL
History:
 - [kernel-vanilla-mainline|http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201014/x86_64/] Repository type can't be determined.

Warning: Skipping repository 'Linux vanilla kernels from mainline series' because of the above error.
Some of the repositories have not been refreshed because of an error.

3400G:~ # zypper lr kernel-vanilla-mainline
Alias          : kernel-vanilla-mainline
Name           : Linux vanilla kernels from mainline series
URI            : http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201014/x86_64/
Enabled        : Yes
GPG Check      : ( p) Yes
Priority       : 99 (default priority)
Autorefresh    : Off
Keep Packages  : Off
Type           : NONE
GPG Key URI    : https://repos.fedorapeople.org/repos/thl/RPM-GPG-KEY-knurd-kernel-vanilla
Path Prefix    : 
Parent Service : 
Keywords       : ---
Repo Info Path : /etc/zypp/repos.d/kernel-vanilla.repo
MD Cache Path  : /var/cache/zypp/raw/kernel-vanilla-mainline
3400G:~ #
Comment 74 Takashi Iwai 2020-10-19 12:35:42 UTC
It's better not to add repo but just download the target *.rpm file and install it directly.
Comment 75 Karl Mistelberger 2020-10-19 14:52:13 UTC
(In reply to Takashi Iwai from comment #74)
> It's better not to add repo but just download the target *.rpm file and
> install it directly.

Here we are:

3400G:/var/cache/zypp/packages/Fedora # rpm -ivh * --nodeps
warning: kernel-5.9.1-36.vanilla.1.fc32.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 863625fa: NOKEY
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-core-5.9.1-36.vanilla.1.fc################################# [ 20%]
   2:kernel-modules-5.9.1-36.vanilla.1################################# [ 40%]
   3:kernel-5.9.1-36.vanilla.1.fc32   ################################# [ 60%]
   4:kernel-modules-extra-5.9.1-36.van################################# [ 80%]
   5:kernel-modules-internal-5.9.1-36.################################# [100%]
/var/tmp/rpm-tmp.Bw5jCe: line 1: /bin/kernel-install: No such file or directory
warning: %posttrans(kernel-core-5.9.1-36.vanilla.1.fc32.x86_64) scriptlet failed, exit status 127
3400G:/var/cache/zypp/packages/Fedora #
Comment 76 Takashi Iwai 2020-10-19 15:25:40 UTC
You can create the initrd manually like
  % /sbin/mkinitrd -k vmlinuz-5.9.... -i initrd-5.9....

where 5.9.... is the file name of /boot/vmlinuz-* corresponding to the Fedora kernel you've installed.  Once after mkinitrd succeeded, the update GRUB entry like
   % /sbin/update-bootloader --add --image /boot/vmlinuz-5.9.... --initrd /boot/initrd-5.9....
   % /usr/sbin/grub2-mkconfig -o /boot/grub2/grub.cfg

Then reboot with that kernel with fingers crossed.
Comment 77 Karl Mistelberger 2020-10-19 16:23:50 UTC
As the postinstall scripts fail no kernel is generated. :-(
Comment 78 Takashi Iwai 2020-10-19 16:34:40 UTC
(In reply to Karl Mistelberger from comment #77)
> As the postinstall scripts fail no kernel is generated. :-(

Then try to install with rpm --noscripts option.
Comment 79 Karl Mistelberger 2020-10-21 04:56:25 UTC
Snapshot 20201019 fixes the freeze on resume from suspend.
Comment 80 Takashi Iwai 2020-10-21 08:07:05 UTC
Interesting, so it's likely either the kernel update to 5.9.x or the fix of kernel-firmware-amdgpu took effect.

In anyway, it's good to hear that the issue is gone :)
Comment 81 Karl Mistelberger 2020-10-21 21:11:23 UTC
(In reply to Takashi Iwai from comment #80)
> Interesting, so it's likely either the kernel update to 5.9.x or the fix of
> kernel-firmware-amdgpu took effect.
 
There is the old kernel in http://download.opensuse.org/tumbleweed/repo/oss/ and new firmware in http://download.opensuse.org/update/tumbleweed/

i+ | kernel-default              | package | 5.8.14-1.2   | x86_64 | Haupt-Repository (OSS)
i+ | kernel-firmware-all         | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository
i+ | kernel-firmware-amdgpu      | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository

...

i  | kernel-firmware-usb-network | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository
i  | purge-kernels-service       | package | 0-7.2        | noarch | Haupt-Repository (OSS)
Comment 82 Takashi Iwai 2020-10-23 16:16:21 UTC
I guess it's kernel-firmware workaround, but hey, who knows :)

In anyway, assume that it'll keep working, and let's close now.
Feel free to reopen if you encounter the same problem again.  Thanks.
Comment 84 Karl Mistelberger 2020-10-29 15:01:47 UTC
(In reply to Takashi Iwai from comment #82)
> I guess it's kernel-firmware workaround, but hey, who knows :)
> 
> In anyway, assume that it'll keep working, and let's close now.
> Feel free to reopen if you encounter the same problem again.  Thanks.

Changed the monitor and the freeze upon suspend/resume is back:

3400G:~ # hwinfo --monitor
35: None 00.0: 10002 LCD Monitor                                
  [Created at monitor.125]
  Unique ID: rdCR.K1i5gxVmsEC
  Parent ID: GBI1.Tt0a+NI8vi1
  Hardware Class: monitor
  Model: "SAMSUNG LU28R55"
  Vendor: SAM "SAMSUNG"
  Device: eisa 0x1017 "LU28R55"
  Serial ID: "H4ZN302578"
  Resolution: 720x400@70Hz
  Resolution: 640x480@60Hz
  Resolution: 640x480@67Hz
  Resolution: 640x480@72Hz
  Resolution: 640x480@75Hz
  Resolution: 800x600@56Hz
  Resolution: 800x600@60Hz
  Resolution: 800x600@72Hz
  Resolution: 800x600@75Hz
  Resolution: 832x624@75Hz
  Resolution: 1024x768@60Hz
  Resolution: 1024x768@70Hz
  Resolution: 1024x768@75Hz
  Resolution: 1280x1024@75Hz
  Resolution: 1152x864@75Hz
  Resolution: 1280x720@60Hz
  Resolution: 1280x1024@60Hz
  Resolution: 3840x2160@60Hz
  Size: 632x360 mm
  Year of Manufacture: 2038
  Week of Manufacture: 50
  Detailed Timings #0:
     Resolution: 3840x2160
     Horizontal: 3840 4016 4104 4400 (+176 +264 +560) +hsync
       Vertical: 2160 2168 2178 2250 (+8 +18 +90) +vsync
    Frequencies: 594.00 MHz, 135.00 kHz, 60.00 Hz
  Driver Info #0:
    Max. Resolution: 3840x2160
    Vert. Sync Range: 50-75 Hz
    Hor. Sync Range: 30-135 kHz
    Bandwidth: 594 MHz
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #12 (VGA compatible controller)

3400G:~ # journalctl -b -3 --grep amdgpu -o short-monotonic -p err 
-- Logs begin at Wed 2020-10-21 16:58:25 CEST, end at Thu 2020-10-29 16:00:08 CET. --
[  274.870164] 3400G kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-22).
[  377.490546] 3400G kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
3400G:~ #
Comment 85 Takashi Iwai 2020-10-29 15:05:19 UTC
OK, then please report and track the bug on the upstream bug tracker, e.g. the gitlab.freedesktop.org issues.  The package bug must have been fixed, so the rest is pure the driver or the firmware bug, which we can't help much from distro side.
Comment 86 Karl Mistelberger 2020-10-30 06:05:49 UTC
(In reply to Takashi Iwai from comment #85)
> OK, then please report and track the bug on the upstream bug tracker, e.g.
> the gitlab.freedesktop.org issues.  The package bug must have been fixed, so
> the rest is pure the driver or the firmware bug, which we can't help much
> from distro side.

I did so: https://gitlab.freedesktop.org/drm/amd/-/issues/1354

However this morning I found suspend/resume doesn't work anymore with the old monitor. It worked with firmware in http://download.opensuse.org/update/tumbleweed/, but that's now gone and http://download.opensuse.org/tumbleweed/repo/oss/ is now used:

3400G:~ # zypper se -is kernel-firmware-amdgpu
Loading repository data...
Reading installed packages...

S  | Name                   | Type    | Version      | Arch   | Repository
---+------------------------+---------+--------------+--------+--------------------------------
i+ | kernel-firmware-amdgpu | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository
3400G:~ #
Comment 87 Karl Mistelberger 2020-11-04 06:23:54 UTC
Tested the following versions of kernel-firmware-amdgpu so far:

20200207-1.1
20200302-1.1
20200519-2.1
20200610-1.1
20200807-1.2
20200916-1.1
20201005-1.1
20201005-3.1
20201005-334.1
20201005-336.1
20201005-36.1
20201023-2.1

All of them fail on suspend to RAM/resume. The newest 20201023-2.1 adds some additional trouble:

3400G:~ # journalctl -b -p err
-- Logs begin at Fri 2020-10-30 05:53:09 CET, end at Wed 2020-11-04 07:21:17 CET. --
Nov 04 07:10:09 3400G kernel: pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.
Nov 04 07:10:09 3400G systemd-modules-load[221]: Failed to find module 'platform-integrity'
Nov 04 07:10:11 3400G systemd-modules-load[493]: Failed to find module 'platform-integrity'
Nov 04 07:10:12 3400G kernel: kvm: disabled by bios
Nov 04 07:10:12 3400G kernel: kvm: disabled by bios
Nov 04 07:10:12 3400G kernel: kvm: disabled by bios
Nov 04 07:10:12 3400G kernel: kvm: disabled by bios
Nov 04 07:10:13 3400G kernel: kvm: disabled by bios
Nov 04 07:10:13 3400G kernel: kvm: disabled by bios
Nov 04 07:10:13 3400G kernel: kvm: disabled by bios
Nov 04 07:10:13 3400G kernel: kvm: disabled by bios
Nov 04 07:10:24 3400G kmail[2641]: No text-to-speech plug-ins were found.
3400G:~ #
Comment 89 Nikolai Nikolaevskii 2021-02-23 19:13:58 UTC
Using AMD Ryzen 3 3200G.
In spring 2020 I was using Leap 15.1.
To use built-in graphics I needed kernel newer than 4.12 from Leap 15.1.
So I used kernels from kernel:stable repo.
With Leap 15.1 + kernel 5.5.x suspend to RAM worked OK.
With Leap 15.1 + kernel 5.6.x suspend to RAM stopped to work.
Then I used kernel 5.3 for Leap 15.1 from Leap 15.2 developers repo to get suspend  to RAM working.
Now suspend to RAM is working OK with Leap 15.2 and standard 5.3 kernel.
Comment 90 Karl Mistelberger 2021-03-13 08:52:15 UTC
(In reply to Nikolai Nikolaevskii from comment #89)
> Using AMD Ryzen 3 3200G.
> In spring 2020 I was using Leap 15.1.
> To use built-in graphics I needed kernel newer than 4.12 from Leap 15.1.
> So I used kernels from kernel:stable repo.
> With Leap 15.1 + kernel 5.5.x suspend to RAM worked OK.
> With Leap 15.1 + kernel 5.6.x suspend to RAM stopped to work.
> Then I used kernel 5.3 for Leap 15.1 from Leap 15.2 developers repo to get
> suspend  to RAM working.
> Now suspend to RAM is working OK with Leap 15.2 and standard 5.3 kernel.

I tried Leap 5.3.18-lp152.66-default and still get the following messages on suspend to RAM:

Mar 13 09:46:17 Leap kernel: Non-boot CPUs are not disabled
Mar 13 09:46:17 Leap kernel: amdgpu 0000:08:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-22).
Mar 13 09:46:17 Leap kernel: [drm:process_one_work] *ERROR* ib ring test failed (-22).

So I am wondering what your exact kernel version is.

Mine are:

i+ | kernel-default              | package | 5.3.18-lp152.66.2 | x86_64 | Hauptaktualisierungs-Repository
i+ | kernel-default              | package | 5.3.18-lp152.63.1 | x86_64 | Hauptaktualisierungs-Repository
i+ | kernel-firmware-all         | package | 20201120-35.1     | noarch | (System Packages)
Comment 91 Takashi Iwai 2021-03-13 09:15:27 UTC
It's hard to say.  We used to have a workaround (keeping the old firmware file) for Vega10 in Leap 15.1 and Leap 15.2, but this was dropped in TW (hence also Kernel:HEAD and Kernel:stable) as well as Leap 15.3.

So, if you have kernel-firmware-all package on your system (not kernel-firmware), it means you having the latest firmware from TW/Kernel:HEAD, and the workaround in the firmware was gone.  And, IIRC, this problem depends on the hardware setup such as the backlight level, so the upstream couldn't reproduce the issue.

If you see the problem with the latest TW kernel and with the latest kernel-firmware-amdgpu package, you should report the problem to upstream and resolve the bug there at first.
Comment 92 Karl Mistelberger 2021-03-13 10:09:50 UTC
(In reply to Takashi Iwai from comment #91)
> If you see the problem with the latest TW kernel and with the latest
> kernel-firmware-amdgpu package, you should report the problem to upstream
> and resolve the bug there at first.

I reported the bug here: https://gitlab.freedesktop.org/drm/amd/-/issues/1354

But I am still waiting for a response. Any idea how to proceed?
Comment 93 Nikolai Nikolaevskii 2021-03-28 17:27:41 UTC
(In reply to Takashi Iwai from comment #91)
> It's hard to say.  We used to have a workaround (keeping the old firmware
> file) for Vega10 in Leap 15.1 and Leap 15.2, but this was dropped in TW
> (hence also Kernel:HEAD and Kernel:stable) as well as Leap 15.3.
> 
> So, if you have kernel-firmware-all package on your system (not
> kernel-firmware), it means you having the latest firmware from
> TW/Kernel:HEAD, and the workaround in the firmware was gone.  And, IIRC,
> this problem depends on the hardware setup such as the backlight level, so
> the upstream couldn't reproduce the issue.
> 
> If you see the problem with the latest TW kernel and with the latest
> kernel-firmware-amdgpu package, you should report the problem to upstream
> and resolve the bug there at first.

We can get firmware files from amdgpu-pro drivers, package "RPMS/noarch/amdgpu-dkms-firmware*".
The latest 20.50: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-50
20.40: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-40
20.10: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-10
19.50: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux

Change last numbers to get another version.

What files to use? vega10*.bin or vega12*.bin or vega20*.bin or vegam*.bin?
Ryzen 3200G has Vega 8, Ryzen 3400G has Vega 11 (Radeon™ RX Vega 11 Graphics).

To OP (Karl Mistelberger): try to use firmware from amdgpu-pro-20.10.
Comment 94 Karl Mistelberger 2021-04-06 17:36:22 UTC
I am unsure:

- which file to download

- which packet to install

downloaded amdgpu-pro-20.50-1234663-sle-15.2.tar.xz and inspected RPMS/noarch/amdgpu-dkms-firmware-5.9.10.69-1234663.noarch.rpm for vega11 to no avail.
Comment 95 Karl Mistelberger 2021-04-11 06:52:45 UTC
Tested firmware versions 20.10 and 20.50. Suspend to RAM fails with both versions. With 20.50 the machine hangs. With 20.10 it recovers, see attachment.
Comment 96 Karl Mistelberger 2021-04-11 06:54:16 UTC
Created attachment 848217 [details]
journal suspend to RAM
Comment 97 Nikolai Nikolaevskii 2021-05-15 16:00:34 UTC
OMG, we need picasso_*.bin, not vega*.bin files! (For Ryzen APU with Vega graphics, 12 files in my case).
Comment 98 Nikolai Nikolaevskii 2021-05-30 08:16:32 UTC
ILL OP solved his problem by changing motherboard from Gigabyte B450 Aorus Elite to Asus PRIME B450-PLUS.
Mine Asus X570 + Picasso AMD Ryzen 3200G suspends to RAM OK.

Possible reasons:
1. EFI firmware.
2. Problems with LED subsystem.

https://forums.opensuse.org/showthread.php/553786-AMDGPU-errors-and-occasional-crashes-hangs?p=3031924#post3031924
https://forums.opensuse.org/showthread.php/553786-AMDGPU-errors-and-occasional-crashes-hangs?p=3032042#post3032042
Comment 100 Karl Mistelberger 2021-06-27 13:46:30 UTC
(In reply to Nikolai Nikolaevskii from comment #98)
> ILL OP solved his problem by changing motherboard from Gigabyte B450 Aorus
> Elite to Asus PRIME B450-PLUS.
> Mine Asus X570 + Picasso AMD Ryzen 3200G suspends to RAM OK.
> 
> Possible reasons:
> 1. EFI firmware.
> 2. Problems with LED subsystem.

The ASUSTeK model: PRIME B450-PLUS suspends/resumes flawlessly since moving from Gigabyte B450 Aorus.

Spurious crashes of GPU  with IO_PAGE_FAULTs observed.
Comment 101 Felipe Martinez 2021-11-26 16:50:33 UTC
This was fixed months ago and just started happening again I believe in 5.14?
Comment 102 Takashi Iwai 2021-11-29 16:16:09 UTC
(In reply to Felipe Martinez from comment #101)
> This was fixed months ago and just started happening again I believe in 5.14?

Do you mean experiencing it again on your machine?  Details please.
Comment 103 Felipe Martinez 2021-11-29 23:53:09 UTC
That's exactly right Takashi. I originally tested out a fix through the PBS woth you guys' help and a particular 6 patches I had found on some posts online. Eventually it all worked great, then start I believe it was 5.14 (maybe 5.13?) It stopped sleeping again.

This is a Lenovo Ideapad3 with a 4500U (perhaps a 5500u, i forget now I'm out of town).

It sometimes goes to sleep fine but doesn't wake up, and sometimes it refuses to go to sleep and just goes dark with a flat white light (not blinking)


What I can try doing is recompiling a kernel with the 6 patches I had gone with originally and see if that brings us back to working fashion.
Comment 104 Takashi Iwai 2021-12-06 10:56:05 UTC
(In reply to Felipe Martinez from comment #103)
> What I can try doing is recompiling a kernel with the 6 patches I had gone
> with originally and see if that brings us back to working fashion.

That'd be appreciated.  Let us know the result!

If the problem is persistent, maybe it's worth to open another bug report and track there instead of sticking here.
Comment 105 Jiri Slaby 2023-01-25 11:41:04 UTC
Does this still happen? It looks the bug bitrot after such a long time :/.
Comment 106 Karl Mistelberger 2023-01-25 11:53:30 UTC
(In reply to Jiri Slaby from comment #105)
> Does this still happen? It looks the bug bitrot after such a long time :/.

Replaced the motherboard with a different model, which is working properly. I think the bug survived. However I sold the old board and can't verify.

A new bug popped up on new hardware, less severe, but annoying anyway and presumably a very robust one: https://bugzilla.opensuse.org/show_bug.cgi?id=1206864
Comment 107 Jiri Slaby 2023-01-26 10:07:04 UTC
Ok, without ability to further investigate, let's close this until someone else hits this.