Bugzilla – Full Text Bug Listing |
Summary: | AMDGPU resume fail | ||
---|---|---|---|
Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Karl Mistelberger <karl.mistelberger> |
Component: | Kernel | Assignee: | openSUSE Kernel Bugs <kernel-bugs> |
Status: | RESOLVED WORKSFORME | QA Contact: | E-mail List <qa-bugs> |
Severity: | Normal | ||
Priority: | P5 - None | CC: | 2012gdwu, alexander.deucher, chris-hartmann, felipefm, fvogt, jslaby, karl.mistelberger, kaykaykay123, patrik.jakobsson, tiwai, tzimmermann |
Version: | Current | ||
Target Milestone: | --- | ||
Hardware: | x86-64 | ||
OS: | openSUSE Tumbleweed | ||
URL: | https://gitlab.freedesktop.org/drm/amd/-/issues/1354 | ||
Whiteboard: | |||
Found By: | --- | Services Priority: | |
Business Priority: | Blocker: | --- | |
Marketing QA Status: | --- | IT Deployment: | --- |
Attachments: |
old boot
new boot new boot last good boot fresh install of snapshot 20201007 snapshot 20201008 even worse 20201008 with firmware 20201007 still freezing Picasso firmware files dmesg output journal suspend to RAM |
Description
Karl Mistelberger
2020-10-07 11:31:42 UTC
Hmm. Kernel issue. Is this a regression? (In reply to Takashi Iwai from comment #2) > Is this a regression? I assembled the machine in July. IIRC suspend/resume worked then. Later I ran into trouble with graphics, which went away with further updating: https://forums.opensuse.org/showthread.php/544219-Amdgpu-Trouble drm is a great idea, but to me it seems it still has teething problems. But how about this suspend/resume problem? Is it a new issue that worked in the past version? (In reply to Takashi Iwai from comment #4) > But how about this suspend/resume problem? Is it a new issue that worked in > the past version? journal says: .... Jul 21 08:14:20 localhost systemd[1]: Condition check resulted in Load Kernel Module drm being skipped. .... Thus I presume suspend/resume worked because drm was no used. Now it gets loaded and fails. BTW: How can I skip loading drm for the time being? You *did* use amdgpu driver in the early usage, no? The message might be a red herring. So, please clarify the situation: - Which Tumbleweed snapshot it worked in which configuration - What exactly gets broken now: tell the procedure; how to suspend and how to resume? - How is your configuration now We really need to identify the difference if it worked somehow in the past. Created attachment 842420 [details]
old boot
(In reply to Takashi Iwai from comment #6) > You *did* use amdgpu driver in the early usage, no? The message might be a > red herring. I am unsure. I did a plain install without further tinkering. > Which Tumbleweed snapshot it worked in which configuration Snapshots are gone, but Tumbleweed was kept up to date by running "zypper dup". See attached journal. > - What exactly gets broken now: tell the procedure; how to suspend and how > to resume? From the journal: [ 89.306023] systemd[1]: Starting Suspend... [ 89.315625] systemd-sleep[2156]: INFO: Skip running /usr/lib/systemd/system-sleep/grub2.sleep for suspend [ 89.316552] systemd-sleep[2151]: Suspending system... [ 100.329743] systemd-sleep[2151]: System resumed. [ 100.333370] systemd-sleep[2245]: INFO: Skip running /usr/lib/systemd/system-sleep/grub2.sleep for suspend [ 100.334508] systemd[1]: systemd-suspend.service: Succeeded. [ 100.334894] systemd[1]: Finished Suspend. > - How is your configuration now Current system is: Operating System: openSUSE Tumbleweed 20201005 KDE Plasma Version: 5.19.5 KDE Frameworks Version: 5.74.0 Qt Version: 5.15.1 Kernel Version: 5.8.12-1-default OS Type: 64-bit Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics Memory: 29.3 GiB of RAM Graphics Processor: AMD RAVEN See also attached journal. Created attachment 842421 [details]
new boot
(In reply to Karl Mistelberger from comment #8) > (In reply to Takashi Iwai from comment #6) > > You *did* use amdgpu driver in the early usage, no? The message might be a > > red herring. > > I am unsure. I did a plain install without further tinkering. The log you attached showed the amdgpu drm driver being used, so it was deployed. > > Which Tumbleweed snapshot it worked in which configuration > > Snapshots are gone, but Tumbleweed was kept up to date by running "zypper > dup". See attached journal. So no configuration change in your side. > > - What exactly gets broken now: tell the procedure; how to suspend and how > > to resume? > > From the journal: > > [ 89.306023] systemd[1]: Starting Suspend... > [ 89.315625] systemd-sleep[2156]: INFO: Skip running > /usr/lib/systemd/system-sleep/grub2.sleep for suspend > [ 89.316552] systemd-sleep[2151]: Suspending system... > [ 100.329743] systemd-sleep[2151]: System resumed. > [ 100.333370] systemd-sleep[2245]: INFO: Skip running > /usr/lib/systemd/system-sleep/grub2.sleep for suspend > [ 100.334508] systemd[1]: systemd-suspend.service: Succeeded. > [ 100.334894] systemd[1]: Finished Suspend. Erm, it's not clear "how" you triggered the suspend and resumed. By the lid close (if laptop), from KDE menu, or whatever? Just to be sure. Also, do I understand correct that you're dealing with the suspend-to-RAM, not the hibernation, right? > > - How is your configuration now > > Current system is: > > Operating System: openSUSE Tumbleweed 20201005 > KDE Plasma Version: 5.19.5 > KDE Frameworks Version: 5.74.0 > Qt Version: 5.15.1 > Kernel Version: 5.8.12-1-default > OS Type: 64-bit > Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics > Memory: 29.3 GiB of RAM > Graphics Processor: AMD RAVEN > > See also attached journal. Thanks! The old log showed that it was 5.7.x kernel. Could you try to install the old kernel from OBS home:tiwai:5.7 repo, boot with it and test the suspend/resume? http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.7/standard/ > > (In reply to Takashi Iwai from comment #6) > Erm, it's not clear "how" you triggered the suspend and resumed. By the lid > close (if laptop), from KDE menu, or whatever? Just to be sure. KDE > Application Starter > Leave > Suspend > Also, do I understand correct that you're dealing with the suspend-to-RAM, > not the hibernation, right? Suspend to RAM. > The old log showed that it was 5.7.x kernel. Could you try to install the > old kernel from OBS home:tiwai:5.7 repo, boot with it and test the > suspend/resume? > > http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.7/standard/ Installed and got a freeze too. See attachment. Created attachment 842426 [details]
new boot
OK, thanks. Then could you try to crawl through the old journal and check which kernel version started showing the problem? The one you showed with 5.7.7 and the lastly tested was 5.7.12. There might be something between them you've tested. Also, please check whether "amd_iommu=off" boot option makes any improvement wrt this bug. (In reply to Takashi Iwai from comment #13) > OK, thanks. > Then could you try to crawl through the old journal and check which kernel > version started showing the problem? The one you showed with 5.7.7 and the > lastly tested was 5.7.12. There might be something between them you've > tested. Last good resume from journal is with 8.4-1-default. (In reply to Takashi Iwai from comment #14) > Also, please check whether "amd_iommu=off" boot option makes any improvement > wrt this bug. Booted with amd_iommu=off and got a freeze too. Created attachment 842428 [details]
last good boot
Hrm, 5.8-series still worked while the test with 5.7.12 failed now. This sounds rather like some change outside the kernel in the past triggered the problem? Ales, does this ring your bell? It's Tumbleweed, so everything should be fairly close to the latest upstream. Created attachment 842483 [details]
fresh install of snapshot 20201007
yet another error messages during suspend/resume
Created attachment 842489 [details]
snapshot 20201008 even worse
really bad experience; needed to roll back to 20201007 (which worked perfectly).
Hm, on 20201008, the amdgpu driver goes south even at a fresh boot time. I noticed that this new TW snapshot includes the update of kernel-firmware package. Could you try to downgrade kernel-firmware package from 20201008 state to the version in 202017? You can find the old package in OBS history repo, http://download.opensuse.org/history/ BTW you need to update only amdgpu firmware: % zypper in --oldpackage --force http://download.opensuse.org/history/20201007/tumbleweed/repo/oss/noarch/kernel-firmware-amdgpu-20200916-1.1.noarch.rpm Created attachment 842491 [details]
20201008 with firmware 20201007
Booted into 20201007, locked all firmed and duped to 20201008. System boots, but again freezes on resume.
OK, then the latest crash at boot was indeed a regression of amdgpu firmware. Just to make sure: the oldest kernel-firmware found in OBS history repo is: http://download.opensuse.org/history/20200907/tumbleweed/repo/oss/noarch/kernel-firmware-amdgpu-20200807-1.2.noarch.rpm Could you try to downgrade to this version and retest? Also, please try to boot with "firmware_class.dyndbg=+p" boot option, and give the dmesg output. This will contain the debug prints showing which firmware files are loaded. Created attachment 842492 [details]
still freezing
I might have a similar issue... After doing a zypper dup today I cannot boot my computer. In the rescue system I was able to see some errors in journalctl regarding amdgpu. (In reply to Takashi Iwai from comment #23) > BTW you need to update only amdgpu firmware: > % zypper in --oldpackage --force > http://download.opensuse.org/history/20201007/tumbleweed/repo/oss/noarch/ > kernel-firmware-amdgpu-20200916-1.1.noarch.rpm Downgrading the kernel-firmware-amdgpu package fixed the issue. (In reply to Christian Hartmann from comment #27) > Downgrading the kernel-firmware-amdgpu package fixed the issue. Could you give dmesg output with "firmware_class.dyndbg=+p" boot option, too? We need to check which firmware is involved. In the case of Karl, it was amdgpu/picasso*. (In reply to Karl Mistelberger from comment #26) > Created attachment 842492 [details] > still freezing Thanks. It's Picasso board, and this was already a problem in the past, hence we shipped the older firmware as a workaround. At the latest kernel-firmware update, we removed the workaround as I was informed that the issue should have been fixed, but apparently it's not fixed. So I'm going to put the old firmware again. However, the question is which old one; I'd really like to see whether the original issue (the GPU error at resume) comes from the firmware or not. Now I uploaded various versions of picasso firmware files taken from linux-firmware.git. The tarball contains subdirectory for each version (e.g. 19.50, 20.10, ...). For testing it, try the following: - Create /lib/firmware/updates/amdgpu directory: % mkdir -p /lib/firmware/updates/amdgpu - Copy the contents of the firmware version you want to test (e.g. 19.50): % cp 19.50/amdgpu/picasso* /lib/firmware/updates/amdgpu/ - Rebuild initrd and retest: % mkinitrd % reboot The version 20.40 is the same one as the latest kernel-firmware package, hence this is supposed to be broken. I included it to be sure. Please check each version and let me know the behavior. Thanks! Created attachment 842495 [details]
Picasso firmware files
(In reply to Takashi Iwai from comment #29) > Please check each version and let me know the behavior. Thanks! Tested all of them with 5.8.14-1-default and none of them works. Just to be sure: hofkirchen:~ # journalctl -b 0 --no-h --grep amdgpu|grep Loading Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_gpu_info.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_sdma.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_asd.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_ta.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_pfp.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_me.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_ce.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_rlc_am4.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_mec.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_mec2.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/raven_dmcu.bin Oct 11 10:57:29 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/updates/amdgpu/picasso_vcn.bin Thank you for quick testing. This concluded that the resume problem is no regression of the recent firmware files, at least. Also, could you tell which firmware version did boot properly? I suppose 20.40 caused the same problem? BTW, if you get stuck at boot due to the graphics problem, you can boot once with "nomodeset" boot option, and fix/revert things. The option will disable the native DRM driver. For reverting the manual firmware override, simply remove files/directories /lib/firmware/update/* and rebuild initrd (call mkinitrd). (In reply to Takashi Iwai from comment #32) > Thank you for quick testing. This concluded that the resume problem is no > regression of the recent firmware files, at least. > > Also, could you tell which firmware version did boot properly? I suppose > 20.40 caused the same problem? Never assume anything. All of the five booted correctly and all of them freezed upon resume. Hmm. The latest TW kernel-firmware package contains 20.40, so this should have triggered the same problem. Could you confirm the following? - Remove /lib/firmware/updates/* - Install again the latest TW kernel-firmware (20201008) - mkinitrd and reboot If this shows the boot problem, try to put picasso 20.40 firmware again /lib/firmware/updates, mkinitrd and retest. (In reply to Takashi Iwai from comment #34) > Hmm. The latest TW kernel-firmware package contains 20.40, so this should > have triggered the same problem. > > Could you confirm the following? > - Remove /lib/firmware/updates/* > - Install again the latest TW kernel-firmware (20201008) > - mkinitrd and reboot Boot fails. > If this shows the boot problem, try to put picasso 20.40 firmware again > /lib/firmware/updates, mkinitrd and retest. Boot works. :-) Created attachment 842496 [details] dmesg output (In reply to Takashi Iwai from comment #28) > (In reply to Christian Hartmann from comment #27) > > Downgrading the kernel-firmware-amdgpu package fixed the issue. > > Could you give dmesg output with "firmware_class.dyndbg=+p" boot option, too? > We need to check which firmware is involved. > > In the case of Karl, it was amdgpu/picasso*. I've uploaded my dmesg output... And yes, it also looks like picasso... Weird... Could you try the following? - Remove /lib/firmware/updates again, mkinitrd, and confirm that you get the unbootable state again - Boot with nomodeset, then run % unxz -f /lib/firmware/amdgpu/picasso*.xz mkinitrd, reboot and check whether it works now (In reply to Karl Mistelberger from comment #35) > (In reply to Takashi Iwai from comment #34) > > Hmm. The latest TW kernel-firmware package contains 20.40, so this should > > have triggered the same problem. > > > > Could you confirm the following? > > - Remove /lib/firmware/updates/* > > - Install again the latest TW kernel-firmware (20201008) > > - mkinitrd and reboot > > Boot fails. > > > If this shows the boot problem, try to put picasso 20.40 firmware again > > /lib/firmware/updates, mkinitrd and retest. > > Boot works. :-) Weird... Could you try the following? - Remove /lib/firmware/updates again, mkinitrd, and confirm that you get the unbootable state again - Boot with nomodeset, then run % unxz -f /lib/firmware/amdgpu/picasso*.xz mkinitrd, reboot and check whether it works now (In reply to Christian Hartmann from comment #36) > I've uploaded my dmesg output... And yes, it also looks like picasso... OK, then could you also check the test in comment 29? (In reply to Takashi Iwai from comment #37) > Weird... Could you try the following? > > - Remove /lib/firmware/updates again, mkinitrd, and confirm that you get the > unbootable state again > > - Boot with nomodeset, then run > % unxz -f /lib/firmware/amdgpu/picasso*.xz > mkinitrd, reboot and check whether it works now Tried that: Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_gpu_info.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_sdma.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_asd.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_ta.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_pfp.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_me.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_ce.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_rlc_am4.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_mec.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_mec2.bin Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/raven_dmcu.bin.xz Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: Loading firmware from /lib/firmware/amdgpu/picasso_vcn.bin Uncompressed, ran mkinitrd and rebooted. Boot failed: Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110) Oct 11 20:00:30 kernel: [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v9_0> failed -110 Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed Oct 11 20:00:30 kernel: amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init Oct 11 20:00:30 kernel: kvm: disabled by bios Oct 11 20:00:30 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008 Oct 11 20:00:30 kernel: #PF: supervisor read access in kernel mode Oct 11 20:00:30 kernel: #PF: error_code(0x0000) - not-present page Oct 11 20:00:30 systemd-udevd[513]: 0000:06:00.0: Worker [537] failed Oct 11 20:00:30 kernel: kvm: disabled by bios Oct 11 20:00:31 sddm[931]: Failed to read display number from pipe Oct 11 20:00:31 sddm[931]: Display server failed to start. Exiting Oct 11 20:00:31 kernel: BUG: unable to handle page fault for address: 0000000000016928 Oct 11 20:00:31 kernel: #PF: supervisor read access in kernel mode Oct 11 20:00:31 kernel: #PF: error_code(0x0000) - not-present page Oct 11 20:00:31 systemd[1]: Failed to start X Display Manager. Recovered by forced install of kernel-firmware-amdgpu and copying 20.40/amdgpu/picasso* to /lib/firmware/updates/amdgpu/, mkinitrd and reboot. Manjaro Version 5.8.11-1-MANJARO + amd-ucode.img works fine for suspend/resume (In reply to Takashi Iwai from comment #38) > (In reply to Christian Hartmann from comment #36) > > I've uploaded my dmesg output... And yes, it also looks like picasso... > > OK, then could you also check the test in comment 29? Just did that... I first tried 20.30 and after that 20.40. Both times the system booted just fine. (In reply to Takashi Iwai from comment #38) > (In reply to Christian Hartmann from comment #36) > > I've uploaded my dmesg output... And yes, it also looks like picasso... > > OK, then could you also check the test in comment 29? I just did that... I first tried with 20.30 and after that with 20.40. Both times the system booted just fine. Same issue here: freeze on boot after latest zypper dup on Tumbleweed at line: fb0: switching to amdgpudrmfb from EFI VGA and cpu fan goes on max rpm. HP Pavilion Laptop 15-cw1xxx [AMD/ATI] Picasso [1002:15d8] (rev c3) ATOM BIOS: 113-PICASSO-114 Solution works: boot with nomodesetting parameter to reach desktop and use commands to downgrade broken package: zypper in --oldpackage --force http://download.opensuse.org/history/20201007/tumbleweed/repo/oss/noarch/kernel-firmware-amdgpu-20200916-1.1.noarch.rpm choose solution 3: Solution 3: break kernel-firmware-amdgpu by ignoring some of its dependencies There is thread on suse forum regrading this issue: https://forums.opensuse.org/showthread.php/545773-AMDGPU-failure-after-zypper-dup-today greetings, Zbigniew Now can people wit Picasso board test the kenrel-firmware-amdgpu package in OBS Kernel:HEAD repo? I didn't revert the amdgpu there yet, but there was some error about the split / copy script and it was fixed at first. http://download.opensuse.org/repositories/Kernel:/HEAD/standard/noarch/kernel-firmware-amdgpu-20201005-334.1.noarch.rpm Please make sure that you clear /lib/firmware/updates/* and uncompressed /lib/firmware/amdgpu/picasso* files beforehand. If the boot still fails with this version, let's check again the following: - Uncompress /lib/firmware/amdgpu/picasso*.xz, and retest unxz -f /lib/firmware/amdgpu/picasso*.xz mkinitrd reboot If it still fails, - Compare the contents of those uncompressed picasso* firmware files with the 20.40 version of the tarball in comment 30. All those must be identical. (In reply to Takashi Iwai from comment #43) > Now can people wit Picasso board test the kenrel-firmware-amdgpu package in > OBS Kernel:HEAD repo? I didn't revert the amdgpu there yet, but there was > some error about the split / copy script and it was fixed at first. > > http://download.opensuse.org/repositories/Kernel:/HEAD/standard/noarch/ > kernel-firmware-amdgpu-20201005-334.1.noarch.rpm > > Please make sure that you clear /lib/firmware/updates/* and uncompressed > /lib/firmware/amdgpu/picasso* files beforehand. Still fails. > > If the boot still fails with this version, let's check again the following: > > - Uncompress /lib/firmware/amdgpu/picasso*.xz, and retest > unxz -f /lib/firmware/amdgpu/picasso*.xz > mkinitrd > reboot :~ # unxz -f /lib/firmware/amdgpu/picasso*.xz unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_ce.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_gpu_info.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_me.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_mec.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_mec2.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_pfp.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_rlc.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_sdma.bin.xz: File format not recognized unxz: /lib/firmware/amdgpu/picasso_vcn.bin.xz: File format not recognized :~ # (In reply to Karl Mistelberger from comment #44) > :~ # unxz -f /lib/firmware/amdgpu/picasso*.xz > unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized What shows the output of below? file /lib/firmware/amdgpu/picasso_asd.bin.xz (In reply to Takashi Iwai from comment #45) > (In reply to Karl Mistelberger from comment #44) > > :~ # unxz -f /lib/firmware/amdgpu/picasso*.xz > > unxz: /lib/firmware/amdgpu/picasso_asd.bin.xz: File format not recognized > > What shows the output of below? > file /lib/firmware/amdgpu/picasso_asd.bin.xz /lib/firmware/amdgpu/picasso_asd.bin.xz: empty :~ # ll /lib/firmware/amdgpu/picasso*.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_asd.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_ce.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_gpu_info.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_me.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_mec.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_mec2.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_pfp.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_rlc.bin.xz -rw-r--r-- 1 root root 9160 Oct 8 13:23 /lib/firmware/amdgpu/picasso_rlc_am4.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_sdma.bin.xz -rw-r--r-- 1 root root 9548 Oct 8 13:23 /lib/firmware/amdgpu/picasso_ta.bin.xz --w------- 11 root root 0 Oct 12 09:21 /lib/firmware/amdgpu/picasso_vcn.bin.xz Hm. Something wrong with the installation. Could you retry? At best: % rm -rf /lib/firmware/amdgpu % zypper rm -u --nodeps kernel-firmware-amdgpu % zypper in --oldpackage -f kernel-firmware-amdgpu-XXX.rpm (where XXX is filled with the actual package rpm you've downloaded.) Then check the file command again. (In reply to Takashi Iwai from comment #47) > Hm. Something wrong with the installation. Could you retry? At best: > > % rm -rf /lib/firmware/amdgpu > % zypper rm -u --nodeps kernel-firmware-amdgpu > % zypper in --oldpackage -f kernel-firmware-amdgpu-XXX.rpm > (where XXX is filled with the actual package rpm you've downloaded.) > > Then check the file command again. :~ # ll /lib/firmware/amdgpu/picasso*.xz -rw-r--r-- 3 root root 31836 Oct 11 23:38 /lib/firmware/amdgpu/picasso_asd.bin.xz -rw-r--r-- 2 root root 3156 Oct 11 23:38 /lib/firmware/amdgpu/picasso_ce.bin.xz -rw-r--r-- 2 root root 112 Oct 11 23:38 /lib/firmware/amdgpu/picasso_gpu_info.bin.xz -rw-r--r-- 2 root root 6104 Oct 11 23:38 /lib/firmware/amdgpu/picasso_me.bin.xz -rw-r--r-- 4 root root 26048 Oct 11 23:38 /lib/firmware/amdgpu/picasso_mec.bin.xz -rw-r--r-- 4 root root 26048 Oct 11 23:38 /lib/firmware/amdgpu/picasso_mec2.bin.xz -rw-r--r-- 2 root root 8312 Oct 11 23:38 /lib/firmware/amdgpu/picasso_pfp.bin.xz -rw-r--r-- 2 root root 9292 Oct 11 23:38 /lib/firmware/amdgpu/picasso_rlc.bin.xz -rw-r--r-- 1 root root 9160 Oct 11 23:38 /lib/firmware/amdgpu/picasso_rlc_am4.bin.xz -rw-r--r-- 2 root root 7360 Oct 11 23:38 /lib/firmware/amdgpu/picasso_sdma.bin.xz -rw-r--r-- 1 root root 9548 Oct 11 23:38 /lib/firmware/amdgpu/picasso_ta.bin.xz -rw-r--r-- 3 root root 219540 Oct 11 23:38 /lib/firmware/amdgpu/picasso_vcn.bin.xz :-) OK, now one step forward. And it still fails to boot? (In reply to Takashi Iwai from comment #49) > OK, now one step forward. And it still fails to boot? I boots, however suspend/resume still freezes: Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1 Oct 12 10:38:44 kernel: [drm] Fence fallback timer expired on ring sdma0 Oct 12 10:38:44 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110). Oct 12 10:38:44 kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110). (In reply to Karl Mistelberger from comment #50) > (In reply to Takashi Iwai from comment #49) > > OK, now one step forward. And it still fails to boot? > > I boots, however suspend/resume still freezes: OK, that's more or less expected from previous results. We couldn't hit two birds in a shot :) Now the question is why the boot failure happened with the recent kernel-firmware package. It might be that hard-link contents in the package got broken via update. Let's wait for the test results from other people, too. I prepared another kernel-firmware packages in OBS home:tiwai:test:fw-fix repo. It uses fdupes with -s option to make the duped file symlinks instead of hard-links, as it might work better. Please test just a simple update of kernel-firmware-amdgpu package from: http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm It's just to confirm that switching to symlink doesn't break things. I will do a test in the evening after work :-) (In reply to Takashi Iwai from comment #51) > Please test just a simple update of kernel-firmware-amdgpu package from: > > http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/ > openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm I reverted to broken firmware first by running zypper dup. Then I ran: The following package is going to be upgraded: kernel-firmware-amdgpu The following package is going to change vendor: kernel-firmware-amdgpu openSUSE -> obs://build.opensuse.org/home:tiwai 1 package to upgrade, 1 to change vendor. Overall download size: 7.6 MiB. Already cached: 0 B. No additional space will be used or freed after the operation. Continue? [y/n/v/...? shows all options] (y): Retrieving package kernel-firmware-amdgpu-20201005-336.1.noarch (1/1), 7.6 MiB ( 7.6 MiB unpacked) Boot now works. (In reply to Karl Mistelberger from comment #53) > (In reply to Takashi Iwai from comment #51) > > Please test just a simple update of kernel-firmware-amdgpu package from: > > > > http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/ > > openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm > > I reverted to broken firmware first by running zypper dup. Actually I wonder whether the boot still fails at this moment after zypper dup. If the problem was some hard-link mess via the update, it might work magically even without fixing anything else, just because you once uninstalled and cleaned up. > Then I ran: > > The following package is going to be upgraded: > kernel-firmware-amdgpu > > The following package is going to change vendor: > kernel-firmware-amdgpu > openSUSE -> obs://build.opensuse.org/home:tiwai > > 1 package to upgrade, 1 to change vendor. > Overall download size: 7.6 MiB. Already cached: 0 B. No additional space > will be used or freed after the operation. > Continue? [y/n/v/...? shows all options] (y): Retrieving package > kernel-firmware-amdgpu-20201005-336.1.noarch (1/1), 7.6 MiB ( 7.6 MiB > unpacked) > > Boot now works. Thanks for confirmation. I guess using symlink is a better option in anyway, so let's move toward this. (In reply to Takashi Iwai from comment #54) > Actually I wonder whether the boot still fails at this moment after zypper > dup. > If the problem was some hard-link mess via the update, it might work > magically even without fixing anything else, just because you once > uninstalled and cleaned up. Reverted from obs://build.opensuse.org/home:tiwai to openSUSE. Boot works now. (In reply to Takashi Iwai from comment #54) > (In reply to Karl Mistelberger from comment #53) > > (In reply to Takashi Iwai from comment #51) > > > Please test just a simple update of kernel-firmware-amdgpu package from: > > > > > > http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/ > > > openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm > > > > I reverted to broken firmware first by running zypper dup. > > Actually I wonder whether the boot still fails at this moment after zypper > dup. > If the problem was some hard-link mess via the update, it might work > magically even without fixing anything else, just because you once > uninstalled and cleaned up. If I'm not wrong, this would also explain the behaviour I faced when trying the old firmware releases and switching back to 20.40 still worked fine. So, I've just checked going back to the official released version and boot fails. (In reply to Takashi Iwai from comment #51) > I prepared another kernel-firmware packages in OBS home:tiwai:test:fw-fix > repo. It uses fdupes with -s option to make the duped file symlinks instead > of hard-links, as it might work better. > Please test just a simple update of kernel-firmware-amdgpu package from: > > http://download.opensuse.org/repositories/home:/tiwai:/test:/fw-fix/ > openSUSE_Factory/noarch/kernel-firmware-amdgpu-20201005-336.1.noarch.rpm > > It's just to confirm that switching to symlink doesn't break things. Usingt this version I was able to boot. (In reply to Christian Hartmann from comment #57) > So, I've just checked going back to the official released version and boot > fails. OK. Did you uninstall kernel-firmware-amdgpu package once? Or only upgrade/downgrade the package? If it's the latter case, try the following: - Uninstall kernel-firmware-amdgpu once % zypper rm -u kernel-firmware-amdgpu - Remove stale files in /lib/firmware/amdgpu, if any % rm -f /lib/firmware/amdgpu - Install the kernel-firmware-amdgpu package again from TW % zypper in kernel-firmware-amdgpu-20201005 (pass some option to specify the repo or specify the proper rpm release number to get the TW package.) I guess this would make it working for yours, too. In anyway, it seems that the symlink version of fdupes works better, and I'm going to submit it now. (In reply to Takashi Iwai from comment #58) > (In reply to Christian Hartmann from comment #57) > > So, I've just checked going back to the official released version and boot > > fails. > > OK. Did you uninstall kernel-firmware-amdgpu package once? Or only > upgrade/downgrade the package? If it's the latter case, try the following: > > - Uninstall kernel-firmware-amdgpu once > % zypper rm -u kernel-firmware-amdgpu > > - Remove stale files in /lib/firmware/amdgpu, if any > % rm -f /lib/firmware/amdgpu > > - Install the kernel-firmware-amdgpu package again from TW > % zypper in kernel-firmware-amdgpu-20201005 > (pass some option to specify the repo or specify the proper rpm release > number to get the TW package.) > > I guess this would make it working for yours, too. > > In anyway, it seems that the symlink version of fdupes works better, and I'm > going to submit it now. Yes, after uninstalling and deleting the files my system boots normally with the version from the official repo This is an autogenerated message for OBS integration: This bug (1177428) was mentioned in https://build.opensuse.org/request/show/841342 Factory / kernel-firmware I confirm. For me after uninstalling and deleting /lib/firmware/amdgpu and reinstalling official package my system boots normally with the version from the official repo. Previously I reinstalled this package few times and done mkinitrd -f but this not helped so removing /lib/firmware/amdgpu may be crucial before reinstalling official packageas as I have never done this before Takashi said to do it. Hint: maybe preinstall script should do cleanup task: rm -rf /lib/firmware/amdgpu ? This is second time I had to cleanup and reinstall amdgpu firmware package so problem returns sometimes. Cleaning the directory can be dangerous at upgrading, so I'd like to avoid the hackish way. The hardlink was established by fdupes call for the duplicated files, and this seems causing problems. As mentioned in the above, now I changed the call with -s option to use symlink instead of hardlink, and this must work around the pitfall. The fixed package is on its way to TW. So, let's go back to the original issue, the resume failure. If the boot hang still happens with the next update package, please open another bug and track there. Thanks. (In reply to Takashi Iwai from comment #62) > Cleaning the directory can be dangerous at upgrading, so I'd like to avoid > the hackish way. > > The hardlink was established by fdupes call for the duplicated files, and > this seems causing problems. As mentioned in the above, now I changed the > call with -s option to use symlink instead of hardlink, and this must work > around the pitfall. The fixed package is on its way to TW. This looks very much like bug 1175025. This is fixed in RPM 4.16, but breaks too many package builds so won't be checked in soon. I'll try to convince mls to backport instead. This is an autogenerated message for OBS integration: This bug (1177428) was mentioned in https://build.opensuse.org/request/show/842515 Factory / rpm Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well. (In reply to Karl Mistelberger from comment #65) > Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well. Did you test the kernel with your current openSUSE system? Also what about more recent kernels? (In reply to Takashi Iwai from comment #66) > (In reply to Karl Mistelberger from comment #65) > > Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well. > > Did you test the kernel with your current openSUSE system? > Also what about more recent kernels? Operating System: openSUSE Tumbleweed 20201014 KDE Plasma Version: 5.20.0 KDE Frameworks Version: 5.75.0 Qt Version: 5.15.1 Kernel Version: 5.9.1-1.g8abc535-default OS Type: 64-bit Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics Memory: 29.3 GiB of RAM Graphics Processor: AMD RAVEN kernel-stable freezes too: 3400G:~ # zypper se -is kernel-default Loading repository data... Reading installed packages... S | Name | Type | Version | Arch | Repository ---+----------------+---------+---------------------+--------+------------------ i+ | kernel-default | package | 5.8.15-1.1.gc680e93 | x86_64 | (System Packages) i+ | kernel-default | package | 5.9.1-1.1.g8abc535 | x86_64 | kernel-stable 3400G:~ # (In reply to Karl Mistelberger from comment #67) > (In reply to Takashi Iwai from comment #66) > > (In reply to Karl Mistelberger from comment #65) > > > Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well. > > > > Did you test the kernel with your current openSUSE system? > > Also what about more recent kernels? > > Operating System: openSUSE Tumbleweed 20201014 > KDE Plasma Version: 5.20.0 > KDE Frameworks Version: 5.75.0 > Qt Version: 5.15.1 > Kernel Version: 5.9.1-1.g8abc535-default I meant the recent *Fedora* kernel. They must ship the newer version than the tad old 5.6.y. And I don't know yet what you exactly tested with Fedora kernel... > OS Type: 64-bit > Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics > Memory: 29.3 GiB of RAM > Graphics Processor: AMD RAVEN > > kernel-stable freezes too: Do you mean the freeze at boot, or freeze after resume? (In reply to Takashi Iwai from comment #68) > (In reply to Karl Mistelberger from comment #67) > > (In reply to Takashi Iwai from comment #66) > > > (In reply to Karl Mistelberger from comment #65) > > > > Fedora vmlinuz-5.6.6-300.fc32.x86_64 suspend/resume works well. > > > > > > Did you test the kernel with your current openSUSE system? > > > Also what about more recent kernels? > > > > Operating System: openSUSE Tumbleweed 20201014 > > KDE Plasma Version: 5.20.0 > > KDE Frameworks Version: 5.75.0 > > Qt Version: 5.15.1 > > Kernel Version: 5.9.1-1.g8abc535-default > > I meant the recent *Fedora* kernel. They must ship the newer version than > the tad old 5.6.y. And I don't know yet what you exactly tested with Fedora > kernel... I tested suspend/resume with Fedora, Manjaro and openSUSE here: 3400G:~ # inxi -SMCG System: Host: 3400G Kernel: 5.9.1-1.g8abc535-default x86_64 bits: 64 Console: tty 2 Distro: openSUSE Tumbleweed 20201014 Machine: Type: Desktop Mobo: Gigabyte model: B450 AORUS ELITE v: x.x serial: N/A UEFI: American Megatrends v: F51 date: 12/18/2019 CPU: Topology: Quad Core model: AMD Ryzen 5 3400G with Radeon Vega Graphics bits: 64 type: MT MCP L2 cache: 2048 KiB Speed: 1291 MHz min/max: 1400/3700 MHz Core speeds (MHz): 1: 1361 2: 1328 3: 1300 4: 1309 5: 1258 6: 1368 7: 1302 8: 1342 Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel Display: server: X.Org 1.20.9 driver: amdgpu FAILED: ati unloaded: fbdev,modesetting,vesa resolution: 1920x1080~60Hz OpenGL: renderer: AMD RAVEN (DRM 3.39.0 5.9.1-1.g8abc535-default LLVM 10.0.1) v: 4.6 Mesa 20.1.8 3400G:~ # > Do you mean the freeze at boot, or freeze after resume? Freeze after suspend/resume with openSUSE. No issues with Fedora and Manjaro. I will try to test a newer Fedora kernel. Tested openSUSE: 5.8.14-1.2, 5.8.15-1.1.gc680e93, 5.9.1-1.1.g8abc535 OK, thanks. I asked it because it's possibly a different user-space thing triggering the bug, not the kernel itself. If any, we may try to build a kernel with the same config from other distros that work, and confirm whether it still works or not. Fedora vmlinuz-5.8.15-201.fc32.x86_64 happily resumes from suspend. (In reply to Karl Mistelberger from comment #71) > Fedora vmlinuz-5.8.15-201.fc32.x86_64 happily resumes from suspend. I'm not sure whether you can deploy the Fedora kernel package onto openSUSE system, but it might be worth to try. Just install it via "rpm -ivh xxx.rpm --nodeps", run "mkinitrd" manually, and see whether it boots up. (In reply to Takashi Iwai from comment #72) > (In reply to Karl Mistelberger from comment #71) > > Fedora vmlinuz-5.8.15-201.fc32.x86_64 happily resumes from suspend. > > I'm not sure whether you can deploy the Fedora kernel package onto openSUSE > system, but it might be worth to try. Just install it via "rpm -ivh xxx.rpm > --nodeps", run "mkinitrd" manually, and see whether it boots up. I ran curl https://repos.fedorapeople.org/repos/thl/kernel-vanilla.repo > /etc/zypp/repos.d/kernel-vanilla.repo However I am lost with: 3400G:~ # zypper se -s kernel-vanilla Error building the cache: [kernel-vanilla-mainline|http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201014/x86_64/] Valid metadata not found at specified URL History: - [kernel-vanilla-mainline|http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201014/x86_64/] Repository type can't be determined. Warning: Skipping repository 'Linux vanilla kernels from mainline series' because of the above error. Some of the repositories have not been refreshed because of an error. 3400G:~ # zypper lr kernel-vanilla-mainline Alias : kernel-vanilla-mainline Name : Linux vanilla kernels from mainline series URI : http://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-20201014/x86_64/ Enabled : Yes GPG Check : ( p) Yes Priority : 99 (default priority) Autorefresh : Off Keep Packages : Off Type : NONE GPG Key URI : https://repos.fedorapeople.org/repos/thl/RPM-GPG-KEY-knurd-kernel-vanilla Path Prefix : Parent Service : Keywords : --- Repo Info Path : /etc/zypp/repos.d/kernel-vanilla.repo MD Cache Path : /var/cache/zypp/raw/kernel-vanilla-mainline 3400G:~ # It's better not to add repo but just download the target *.rpm file and install it directly. (In reply to Takashi Iwai from comment #74) > It's better not to add repo but just download the target *.rpm file and > install it directly. Here we are: 3400G:/var/cache/zypp/packages/Fedora # rpm -ivh * --nodeps warning: kernel-5.9.1-36.vanilla.1.fc32.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 863625fa: NOKEY Verifying... ################################# [100%] Preparing... ################################# [100%] Updating / installing... 1:kernel-core-5.9.1-36.vanilla.1.fc################################# [ 20%] 2:kernel-modules-5.9.1-36.vanilla.1################################# [ 40%] 3:kernel-5.9.1-36.vanilla.1.fc32 ################################# [ 60%] 4:kernel-modules-extra-5.9.1-36.van################################# [ 80%] 5:kernel-modules-internal-5.9.1-36.################################# [100%] /var/tmp/rpm-tmp.Bw5jCe: line 1: /bin/kernel-install: No such file or directory warning: %posttrans(kernel-core-5.9.1-36.vanilla.1.fc32.x86_64) scriptlet failed, exit status 127 3400G:/var/cache/zypp/packages/Fedora # You can create the initrd manually like % /sbin/mkinitrd -k vmlinuz-5.9.... -i initrd-5.9.... where 5.9.... is the file name of /boot/vmlinuz-* corresponding to the Fedora kernel you've installed. Once after mkinitrd succeeded, the update GRUB entry like % /sbin/update-bootloader --add --image /boot/vmlinuz-5.9.... --initrd /boot/initrd-5.9.... % /usr/sbin/grub2-mkconfig -o /boot/grub2/grub.cfg Then reboot with that kernel with fingers crossed. As the postinstall scripts fail no kernel is generated. :-( (In reply to Karl Mistelberger from comment #77) > As the postinstall scripts fail no kernel is generated. :-( Then try to install with rpm --noscripts option. Snapshot 20201019 fixes the freeze on resume from suspend. Interesting, so it's likely either the kernel update to 5.9.x or the fix of kernel-firmware-amdgpu took effect. In anyway, it's good to hear that the issue is gone :) (In reply to Takashi Iwai from comment #80) > Interesting, so it's likely either the kernel update to 5.9.x or the fix of > kernel-firmware-amdgpu took effect. There is the old kernel in http://download.opensuse.org/tumbleweed/repo/oss/ and new firmware in http://download.opensuse.org/update/tumbleweed/ i+ | kernel-default | package | 5.8.14-1.2 | x86_64 | Haupt-Repository (OSS) i+ | kernel-firmware-all | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository i+ | kernel-firmware-amdgpu | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository ... i | kernel-firmware-usb-network | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository i | purge-kernels-service | package | 0-7.2 | noarch | Haupt-Repository (OSS) I guess it's kernel-firmware workaround, but hey, who knows :) In anyway, assume that it'll keep working, and let's close now. Feel free to reopen if you encounter the same problem again. Thanks. (In reply to Takashi Iwai from comment #82) > I guess it's kernel-firmware workaround, but hey, who knows :) > > In anyway, assume that it'll keep working, and let's close now. > Feel free to reopen if you encounter the same problem again. Thanks. Changed the monitor and the freeze upon suspend/resume is back: 3400G:~ # hwinfo --monitor 35: None 00.0: 10002 LCD Monitor [Created at monitor.125] Unique ID: rdCR.K1i5gxVmsEC Parent ID: GBI1.Tt0a+NI8vi1 Hardware Class: monitor Model: "SAMSUNG LU28R55" Vendor: SAM "SAMSUNG" Device: eisa 0x1017 "LU28R55" Serial ID: "H4ZN302578" Resolution: 720x400@70Hz Resolution: 640x480@60Hz Resolution: 640x480@67Hz Resolution: 640x480@72Hz Resolution: 640x480@75Hz Resolution: 800x600@56Hz Resolution: 800x600@60Hz Resolution: 800x600@72Hz Resolution: 800x600@75Hz Resolution: 832x624@75Hz Resolution: 1024x768@60Hz Resolution: 1024x768@70Hz Resolution: 1024x768@75Hz Resolution: 1280x1024@75Hz Resolution: 1152x864@75Hz Resolution: 1280x720@60Hz Resolution: 1280x1024@60Hz Resolution: 3840x2160@60Hz Size: 632x360 mm Year of Manufacture: 2038 Week of Manufacture: 50 Detailed Timings #0: Resolution: 3840x2160 Horizontal: 3840 4016 4104 4400 (+176 +264 +560) +hsync Vertical: 2160 2168 2178 2250 (+8 +18 +90) +vsync Frequencies: 594.00 MHz, 135.00 kHz, 60.00 Hz Driver Info #0: Max. Resolution: 3840x2160 Vert. Sync Range: 50-75 Hz Hor. Sync Range: 30-135 kHz Bandwidth: 594 MHz Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #12 (VGA compatible controller) 3400G:~ # journalctl -b -3 --grep amdgpu -o short-monotonic -p err -- Logs begin at Wed 2020-10-21 16:58:25 CEST, end at Thu 2020-10-29 16:00:08 CET. -- [ 274.870164] 3400G kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-22). [ 377.490546] 3400G kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110). 3400G:~ # OK, then please report and track the bug on the upstream bug tracker, e.g. the gitlab.freedesktop.org issues. The package bug must have been fixed, so the rest is pure the driver or the firmware bug, which we can't help much from distro side. (In reply to Takashi Iwai from comment #85) > OK, then please report and track the bug on the upstream bug tracker, e.g. > the gitlab.freedesktop.org issues. The package bug must have been fixed, so > the rest is pure the driver or the firmware bug, which we can't help much > from distro side. I did so: https://gitlab.freedesktop.org/drm/amd/-/issues/1354 However this morning I found suspend/resume doesn't work anymore with the old monitor. It worked with firmware in http://download.opensuse.org/update/tumbleweed/, but that's now gone and http://download.opensuse.org/tumbleweed/repo/oss/ is now used: 3400G:~ # zypper se -is kernel-firmware-amdgpu Loading repository data... Reading installed packages... S | Name | Type | Version | Arch | Repository ---+------------------------+---------+--------------+--------+-------------------------------- i+ | kernel-firmware-amdgpu | package | 20201005-3.1 | noarch | Hauptaktualisierungs-Repository 3400G:~ # Tested the following versions of kernel-firmware-amdgpu so far: 20200207-1.1 20200302-1.1 20200519-2.1 20200610-1.1 20200807-1.2 20200916-1.1 20201005-1.1 20201005-3.1 20201005-334.1 20201005-336.1 20201005-36.1 20201023-2.1 All of them fail on suspend to RAM/resume. The newest 20201023-2.1 adds some additional trouble: 3400G:~ # journalctl -b -p err -- Logs begin at Fri 2020-10-30 05:53:09 CET, end at Wed 2020-11-04 07:21:17 CET. -- Nov 04 07:10:09 3400G kernel: pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter. Nov 04 07:10:09 3400G systemd-modules-load[221]: Failed to find module 'platform-integrity' Nov 04 07:10:11 3400G systemd-modules-load[493]: Failed to find module 'platform-integrity' Nov 04 07:10:12 3400G kernel: kvm: disabled by bios Nov 04 07:10:12 3400G kernel: kvm: disabled by bios Nov 04 07:10:12 3400G kernel: kvm: disabled by bios Nov 04 07:10:12 3400G kernel: kvm: disabled by bios Nov 04 07:10:13 3400G kernel: kvm: disabled by bios Nov 04 07:10:13 3400G kernel: kvm: disabled by bios Nov 04 07:10:13 3400G kernel: kvm: disabled by bios Nov 04 07:10:13 3400G kernel: kvm: disabled by bios Nov 04 07:10:24 3400G kmail[2641]: No text-to-speech plug-ins were found. 3400G:~ # Using AMD Ryzen 3 3200G. In spring 2020 I was using Leap 15.1. To use built-in graphics I needed kernel newer than 4.12 from Leap 15.1. So I used kernels from kernel:stable repo. With Leap 15.1 + kernel 5.5.x suspend to RAM worked OK. With Leap 15.1 + kernel 5.6.x suspend to RAM stopped to work. Then I used kernel 5.3 for Leap 15.1 from Leap 15.2 developers repo to get suspend to RAM working. Now suspend to RAM is working OK with Leap 15.2 and standard 5.3 kernel. (In reply to Nikolai Nikolaevskii from comment #89) > Using AMD Ryzen 3 3200G. > In spring 2020 I was using Leap 15.1. > To use built-in graphics I needed kernel newer than 4.12 from Leap 15.1. > So I used kernels from kernel:stable repo. > With Leap 15.1 + kernel 5.5.x suspend to RAM worked OK. > With Leap 15.1 + kernel 5.6.x suspend to RAM stopped to work. > Then I used kernel 5.3 for Leap 15.1 from Leap 15.2 developers repo to get > suspend to RAM working. > Now suspend to RAM is working OK with Leap 15.2 and standard 5.3 kernel. I tried Leap 5.3.18-lp152.66-default and still get the following messages on suspend to RAM: Mar 13 09:46:17 Leap kernel: Non-boot CPUs are not disabled Mar 13 09:46:17 Leap kernel: amdgpu 0000:08:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-22). Mar 13 09:46:17 Leap kernel: [drm:process_one_work] *ERROR* ib ring test failed (-22). So I am wondering what your exact kernel version is. Mine are: i+ | kernel-default | package | 5.3.18-lp152.66.2 | x86_64 | Hauptaktualisierungs-Repository i+ | kernel-default | package | 5.3.18-lp152.63.1 | x86_64 | Hauptaktualisierungs-Repository i+ | kernel-firmware-all | package | 20201120-35.1 | noarch | (System Packages) It's hard to say. We used to have a workaround (keeping the old firmware file) for Vega10 in Leap 15.1 and Leap 15.2, but this was dropped in TW (hence also Kernel:HEAD and Kernel:stable) as well as Leap 15.3. So, if you have kernel-firmware-all package on your system (not kernel-firmware), it means you having the latest firmware from TW/Kernel:HEAD, and the workaround in the firmware was gone. And, IIRC, this problem depends on the hardware setup such as the backlight level, so the upstream couldn't reproduce the issue. If you see the problem with the latest TW kernel and with the latest kernel-firmware-amdgpu package, you should report the problem to upstream and resolve the bug there at first. (In reply to Takashi Iwai from comment #91) > If you see the problem with the latest TW kernel and with the latest > kernel-firmware-amdgpu package, you should report the problem to upstream > and resolve the bug there at first. I reported the bug here: https://gitlab.freedesktop.org/drm/amd/-/issues/1354 But I am still waiting for a response. Any idea how to proceed? (In reply to Takashi Iwai from comment #91) > It's hard to say. We used to have a workaround (keeping the old firmware > file) for Vega10 in Leap 15.1 and Leap 15.2, but this was dropped in TW > (hence also Kernel:HEAD and Kernel:stable) as well as Leap 15.3. > > So, if you have kernel-firmware-all package on your system (not > kernel-firmware), it means you having the latest firmware from > TW/Kernel:HEAD, and the workaround in the firmware was gone. And, IIRC, > this problem depends on the hardware setup such as the backlight level, so > the upstream couldn't reproduce the issue. > > If you see the problem with the latest TW kernel and with the latest > kernel-firmware-amdgpu package, you should report the problem to upstream > and resolve the bug there at first. We can get firmware files from amdgpu-pro drivers, package "RPMS/noarch/amdgpu-dkms-firmware*". The latest 20.50: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-50 20.40: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-40 20.10: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-10 19.50: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux Change last numbers to get another version. What files to use? vega10*.bin or vega12*.bin or vega20*.bin or vegam*.bin? Ryzen 3200G has Vega 8, Ryzen 3400G has Vega 11 (Radeon™ RX Vega 11 Graphics). To OP (Karl Mistelberger): try to use firmware from amdgpu-pro-20.10. I am unsure: - which file to download - which packet to install downloaded amdgpu-pro-20.50-1234663-sle-15.2.tar.xz and inspected RPMS/noarch/amdgpu-dkms-firmware-5.9.10.69-1234663.noarch.rpm for vega11 to no avail. Tested firmware versions 20.10 and 20.50. Suspend to RAM fails with both versions. With 20.50 the machine hangs. With 20.10 it recovers, see attachment. Created attachment 848217 [details]
journal suspend to RAM
OMG, we need picasso_*.bin, not vega*.bin files! (For Ryzen APU with Vega graphics, 12 files in my case). ILL OP solved his problem by changing motherboard from Gigabyte B450 Aorus Elite to Asus PRIME B450-PLUS. Mine Asus X570 + Picasso AMD Ryzen 3200G suspends to RAM OK. Possible reasons: 1. EFI firmware. 2. Problems with LED subsystem. https://forums.opensuse.org/showthread.php/553786-AMDGPU-errors-and-occasional-crashes-hangs?p=3031924#post3031924 https://forums.opensuse.org/showthread.php/553786-AMDGPU-errors-and-occasional-crashes-hangs?p=3032042#post3032042 But maybe solution is in updating kernel to 5.12: https://forums.opensuse.org/showthread.php/553669-Display-Freeze-on-Laptop-and-attached-HDMI-monitor-AMD-graphic-card?p=3035895#post3035895 https://bugs.mageia.org/show_bug.cgi?id=25882 (In reply to Nikolai Nikolaevskii from comment #98) > ILL OP solved his problem by changing motherboard from Gigabyte B450 Aorus > Elite to Asus PRIME B450-PLUS. > Mine Asus X570 + Picasso AMD Ryzen 3200G suspends to RAM OK. > > Possible reasons: > 1. EFI firmware. > 2. Problems with LED subsystem. The ASUSTeK model: PRIME B450-PLUS suspends/resumes flawlessly since moving from Gigabyte B450 Aorus. Spurious crashes of GPU with IO_PAGE_FAULTs observed. This was fixed months ago and just started happening again I believe in 5.14? (In reply to Felipe Martinez from comment #101) > This was fixed months ago and just started happening again I believe in 5.14? Do you mean experiencing it again on your machine? Details please. That's exactly right Takashi. I originally tested out a fix through the PBS woth you guys' help and a particular 6 patches I had found on some posts online. Eventually it all worked great, then start I believe it was 5.14 (maybe 5.13?) It stopped sleeping again. This is a Lenovo Ideapad3 with a 4500U (perhaps a 5500u, i forget now I'm out of town). It sometimes goes to sleep fine but doesn't wake up, and sometimes it refuses to go to sleep and just goes dark with a flat white light (not blinking) What I can try doing is recompiling a kernel with the 6 patches I had gone with originally and see if that brings us back to working fashion. (In reply to Felipe Martinez from comment #103) > What I can try doing is recompiling a kernel with the 6 patches I had gone > with originally and see if that brings us back to working fashion. That'd be appreciated. Let us know the result! If the problem is persistent, maybe it's worth to open another bug report and track there instead of sticking here. Does this still happen? It looks the bug bitrot after such a long time :/. (In reply to Jiri Slaby from comment #105) > Does this still happen? It looks the bug bitrot after such a long time :/. Replaced the motherboard with a different model, which is working properly. I think the bug survived. However I sold the old board and can't verify. A new bug popped up on new hardware, less severe, but annoying anyway and presumably a very robust one: https://bugzilla.opensuse.org/show_bug.cgi?id=1206864 Ok, without ability to further investigate, let's close this until someone else hits this. |