Bug 1177707

Summary: Zypper dup completely bricked my computer, can't boot
Product: [openSUSE] openSUSE Tumbleweed Reporter: teo teo <teo8976>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: 2012gdwu, fvogt, jreidinger, mrmazda, teo8976, tiwai
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Tumbleweed   
Whiteboard: AMD
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: normal boot
recovery mode boot

Description teo teo 2020-10-14 15:49:57 UTC
Created attachment 842633 [details]
normal boot

I bought a new Laptop just a few days ago. I installed OpenSUSE Tumbleweed from scratch on it. It was working great. I transferred all my data from the old laptop (and wiped that one out).

I started getting a notification that there were updates available, I chose "Install Updates", but it systematically gave me the error "Package Updater has crashed" (or something like that).
I think it's bug 1177556.

So I googled the issue and I ran from a terminal:
  sudo zypper dup

which is what people in forums suggested to do as an alternative way to install the updates. I rebooted as the updater said I needed to.


Now MY COMPUTER IS BRICKED. When I boot I get a black screen with 3 errors (though I think I was seeing at least the last two also when I could boot).

I tried "Advanced options for OpenSUSE" from the boot menu, which gives me 4 options: two kernel versions, and a corresponding "Recovery Mode" option for each one.
Both kernel versions fail to boot with the same errors.

With the recovery mode (which I have no idea what I would be supposed to do with) I get a lot more messages in the black screen, as if it was getting farther in the boot process, but it also gets stuck, the last message being:

fb0: switching to amdgpudrmfb from EFI VGA.


Thanks a lot, OpenSUSE.
This is even worse than Ubuntu.

I attach a photo of the screen when booting normally, and one of the boot screen when booting in Recovery Mode.

Well, actually it seems this stupid bug tracker only allows to attach one file to the report, so I'll attach the second picture later
Comment 1 teo teo 2020-10-14 15:50:32 UTC
Created attachment 842634 [details]
recovery mode boot
Comment 2 teo teo 2020-10-14 16:13:35 UTC
I am able to boot, and at least use my laptop and access my data (if only to save it somewhere and install another OS that doesn't brick my computer at the slightest update), by adding nomodeset to the boot line. (I have no idea what that is)

However, with that, I cannot use an external monitor connected via HDMI, which worked before the zypper dup that bricked the computer.


Should I expect openSUSE to be completely unreliable and unstable with the AMD processor and GPU? If so please tell me, because if that is the issue, I am in time to return the laptop and get another one with an intel CPU and a different GPU. I didn't expect to have to worry about that with any major and popular brand. Is there any at all on which openSUSE is stable and works??
Comment 3 teo teo 2020-10-14 21:44:29 UTC
I have booted with a live Leap USB stick, and it shows exactly the same errors, all 3 of them, but then it boots just fine. So I guess I was right here:

> (though I think I was seeing at least the last two also when I could boot)

Therefore those error must be completely unrelated.

That's funny because the way I found out about nomodeset as a workaround to be able to boot, was by googling the errors.
Comment 4 Josef Reidinger 2020-10-15 08:57:55 UTC
Hi teo, I was hit also by same issue. It looks like something is broken with on board amd discreate cards. Leap 15.2 works fine, so it has to be something in newer kernel or amd framebuffer. Assigning to kernel maintainers who can have idea who is responsible for this part.

BTW it is not brick, just frame buffer does not display anything.
If more info is needed about hardware, I can also provide it.
Comment 5 Fabian Vogt 2020-10-15 09:14:40 UTC
Is this maybe the same as bug 1177428?
Comment 6 Takashi Iwai 2020-10-15 09:41:50 UTC
(In reply to Fabian Vogt from comment #5)
> Is this maybe the same as bug 1177428?

Very likely.  Please try the procedure described in
  https://apibugzilla.suse.com/show_bug.cgi?id=1177428#c58

This should recover the amdgpu firmware.
Comment 7 teo teo 2020-10-15 11:08:28 UTC
> Very likely.  Please try the procedure described in
>   https://apibugzilla.suse.com/show_bug.cgi?id=1177428#c58
> 
> This should recover the amdgpu firmware.

Thank you! That worked!

Now, how do I prevent this from happening again?
Comment 8 teo teo 2020-10-15 11:15:56 UTC
Sorry (not my fault actually) but the comment was submitted before I finished writing (I went to edit a typo in the subject of the issue, and that submitted the unfinished comment too).

What I meant to ask was: how do I prevent the next updates from installing the broken amdgpu driver or whatever it is again, and ensure the working one is kept? Obviously I don't want to hold all system updates of everything, and remain stuck forever with today's version of every software on my system.
Comment 9 Takashi Iwai 2020-10-15 11:17:03 UTC
The very latest kernel-firmware package should work now as-is, since we switched to symlinks instead of hardlinks.

Please reopen if you encounter the same problem again after upgrading the kernel-firmware package at the next time.
Comment 10 Takashi Iwai 2020-10-15 12:56:28 UTC
BTW, regarding the breakage by the firmware updates in general; there is no proper way to fix this other than the rollback.  In theory, you can save the former initrd manually before the update, but such a mechanism isn't implemented in the package level.

I proposed such a failsafe stuff once ago, but it was declined because of the dead resource usages and danger to occupy the previous /boot partition size.
Comment 11 teo teo 2020-10-15 15:21:20 UTC
> Please reopen if you encounter the same problem again after upgrading the 
> kernel-firmware package at the next time

How do I upgrade it now? (or hasn't it be released yet?)

Once upgraded, how do I check that I have the right version?

> regarding the breakage by the firmware updates in general; there is no 
> proper way to fix this other than the rollback

What is the rollback?

> In theory, you can save the former initrd manually before the update

Is there a step-by-step guide of how to do that, and how to used the whatever-it-is-you-save to restore the working system?

> I proposed such a failsafe stuff once ago, but it was declined because...

Then you MUST do the following:
Whenever an update this risky is about to be installed, you should:
- give a gigantic warning that the system may be rendered unbootable (at least I know I wait until next weekend before I install the pending updates), 
- allow to hold that update but install the other ones (I guess the update manager widget already allows you to do that with any updates, but it kept crashing, see bug 1177556)
- together with the abovementioned warning, give a link to the abover-mentioned guide.
Comment 12 teo teo 2020-10-15 15:27:53 UTC
> How do I upgrade it now? (or hasn't it be released yet?)

Also, if I download a Tumbleweed ISO now (on a USB stick) and re-install it from scratch, will it have the new supposedly-fixed version?
Comment 13 Felix Miata 2020-10-15 23:43:36 UTC
Apparently not an issue for all AMD APUs/GPUs on UEFI (e.g. gcn 2nd gen):
# inxi -SGMIay
System:
  Host: ara88 Kernel: 5.8.14-1-default x86_64 bits: 64 compiler: gcc v: 10.2.1
  parameters: mitigations=auto consoleblank=0 radeon.cik_support=0
   amdgpu.cik_support=1   video=1440x900@60 drm.debug=0x1e log_buf_len=1M 3
  Desktop: Trinity R14.0.8 tk: Qt 3.5.0 info: kicker wm: Twin 3.0 dm: TDM
  Distro: openSUSE Tumbleweed 20201012
Machine:
  Type: Desktop Mobo: ASRock model: FM2A88X Extreme6+ serial: E80-38024200616
  UEFI: American Megatrends v: P4.20 date: 01/13/2016
Graphics:
  Device-1: AMD Kaveri [Radeon R7 Graphics] vendor: ASRock driver: amdgpu
  v: kernel alternate: radeon bus ID: 00:01.0 chip ID: 1002:1313
  Display: x11 server: X.Org 1.20.9 driver: amdgpu
  unloaded: fbdev,modesetting,vesa alternate: ati display ID: :0 screens: 1
  Screen-1: 0 s-res: 2560x2520 s-dpi: 120 s-size: 541x533mm (21.3x21.0")
  s-diag: 759mm (29.9")
  Monitor-1: DisplayPort-0 res: 2560x1440 hz: 60 dpi: 109
  size: 598x336mm (23.5x13.2") diag: 686mm (27")
  Monitor-2: HDMI-A-0 res: 2560x1080 hz: 60 dpi: 97
  size: 673x284mm (26.5x11.2") diag: 730mm (28.8")
  OpenGL: renderer: AMD KAVERI (DRM 3.38.0 5.8.14-1-default LLVM 10.0.1)
  v: 4.6 Mesa 20.1.8 direct render: Yes
Info:...running in: konsole inxi: 3.1.07
Comment 14 teo teo 2020-10-16 08:11:49 UTC
Is it any of these?

kernel-firmware-platform (20201005-3.1)
kernel-firmware-radeon (20201005-3.1)
ucode-amd (20201005-3.1)