Bug 1214910

Summary: Yast2 Bootloader frozen during Initializing Boot Loader Configuration
Product: [openSUSE] openSUSE Tumbleweed Reporter: Christian Tallner <christian.tallner>
Component: X.OrgAssignee: Gfx Bugs <gfx-bugs>
Status: RESOLVED NORESPONSE QA Contact: Gfx Bugs <gfx-bugs>
Severity: Normal    
Priority: P3 - Medium CC: christian.tallner, patrik.jakobsson, tiwai, tzimmermann
Version: CurrentFlags: sndirsch: needinfo? (christian.tallner)
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Tumbleweed   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Created with Shift-F8 after a reboot
Picture of the frozen Yast2 Bootloader Module
Xorg Log, after reproduced error and rebooted

Description Christian Tallner 2023-09-03 13:04:12 UTC
Created attachment 869221 [details]
Created with Shift-F8 after a reboot

When I open the Bootloader Page in Yast2 the Bootloader Module opens and then hangs/freezes. I can not close the Bootloader Module and have to reboot.
The status of the page is :
Initializing Boot Loader Configuration
check-icon  Check boot loader
check-icon  Load boot loader settings
Comment 1 Christian Tallner 2023-09-03 13:05:45 UTC
Created attachment 869222 [details]
Picture of the frozen Yast2 Bootloader Module
Comment 2 Stefan Hundhammer 2023-09-04 13:45:13 UTC
Hm... in those logs, I see several times

> <libqt-warning> The X11 connection broke (error 1). Did the X11 server die?

> y2-17-partitioner.log:2023-08-14 20:07:05 <2> localhost.localdomain(29728) [qt-ui] YQUI.cc(qMessageHandler):656 <libqt-warning> The X11 connection broke (error 1). Did the X11 server die?
> y2-39-sw_single.log:2023-08-25 18:55:54 <2> twr7600(30179) [qt-ui] YQUI.cc(qMessageHandler):656 <libqt-warning> The X11 connection broke (error 1). Did the X11 server die?
> y2-41-bootloader.log:2023-09-03 14:04:07 <2> twr7600(4220) [qt-ui] YQUI.cc(qMessageHandler):656 <libqt-warning> The X11 connection broke (error 1). Did the X11 server die?
> y2-46-bootloader.log:2023-09-03 14:12:05 <2> twr7600(4958) [qt-ui] YQUI.cc(qMessageHandler):656 <libqt-warning> The X11 connection broke (error 1). Did the X11 server die?
> y2-47-bootloader.log:2023-09-03 14:45:05 <2> twr7600(10206) [qt-ui] YQUI.cc(qMessageHandler):656 <libqt-warning> The X11 connection broke (error 1). Did the X11 server die?
> y2-48-bootloader.log:2023-09-03 14:56:35 <2> twr7600(3833) [qt-ui] YQUI.cc(qMessageHandler):656 <libqt-warning> The X11 connection broke (error 1). Did the X11 server die?
Comment 3 Stefan Hundhammer 2023-09-04 13:59:13 UTC
From Xorg.0.log of the attached y2logs tarball:

> [    23.978] (II) modeset(0): glamor X acceleration enabled on AMD Radeon Graphics (raphael_mendocino, LLVM 16.0.6, DRM 3.52, 6.4.11-1-default)
> [    23.978] (II) modeset(0): glamor initialized
> ...
> ...
> [    24.016] (II) modeset(0): [DRI2]   DRI driver: radeonsi
> [    24.016] (II) modeset(0): [DRI2]   VDPAU driver: radeonsi
> ...
> ...
> [    24.019] (II) AIGLX: Loaded and initialized radeonsi
> [    24.019] (II) GLX: Initialized DRI2 GL provider for screen 0

But no error or warning AFAICS.


From rpm-qa:

> kernel-firmware-radeon-20230814-1.1	  (openSUSE)  openSUSE Tumbleweed
> libdrm_radeon1-2.4.115-2.4		  (openSUSE)  openSUSE Tumbleweed
> libvdpau_radeonsi-23.1.6-1699.357.pm.1  (http://packman.links2linux.de)  Essentials / openSUSE_Tumbleweed
> libvulkan_radeon-23.1.6-1699.357.pm.1	  (http://packman.links2linux.de)  Essentials / openSUSE_Tumbleweed
Comment 4 Stefan Hundhammer 2023-09-04 14:07:47 UTC
AFAICS those YaST modules that crash (see comment #2) receive a non-recoverable X error "The X11 connection broke (error 1)", reported by the Qt libs; and then they terminate.

Now the question is why you get that X error. It seems to start some time between those YaST invocations:

> y2-38-partitioner.log:  2023-08-25 18:48:41  y2base called with ["partitioner", "qt", "-name", "YaST2", "-icon", "yast"]
> y2-39-sw_single.log:  2023-08-25 18:50:18  y2base called with ["sw_single", "qt", "-name", "YaST2", "-icon", "yast"]

But it doesn't *always* happen, just for certain YaST modules:

% grep -c 'X11 connection broke' y2-*.log
> y2-00-installation.log:0
> y2-01-repositories.log:0
> y2-02-sw_single.log:0
> y2-03-virtualization.log:0
> y2-04-bootloader.log:0
> y2-05-sw_single.log:0
> y2-06-sw_single.log:0
> y2-07-sw_single.log:0
> y2-08-online_update.log:0
> y2-09-sw_single.log:0
> y2-10-bootloader.log:0
> y2-11-sw_single.log:0
> y2-12-sw_single.log:0
> y2-13-virt-install.log:0
> y2-14-sw_single.log:0
> y2-15-partitioner.log:0
> y2-16-partitioner.log:0
> y2-17-partitioner.log:1
> y2-18-partitioner.log:0
> y2-19-bootloader.log:0
> y2-20-partitioner.log:0
> y2-21-sw_single.log:0
> y2-22-repositories.log:0
> y2-23-OneClickInstallWorker.log:0
> y2-24-sw_single.log:0
> y2-25-repositories.log:0
> y2-26-sw_single.log:0
> y2-27-OneClickInstallWorker.log:0
> y2-28-repositories.log:0
> y2-29-sw_single.log:0
> y2-30-sw_single.log:0
> y2-31-sw_single.log:0
> y2-32-sw_single.log:0
> y2-33-repositories.log:0
> y2-34-lan.log:0
> y2-35-firewall.log:0
> y2-36-partitioner.log:0
> y2-37-partitioner.log:0
> y2-38-partitioner.log:0
> y2-39-sw_single.log:1
> y2-40-sw_single.log:0
> y2-41-bootloader.log:1
> y2-42-firewall.log:0
> y2-43-firewall.log:0
> y2-44-firewall.log:0
> y2-45-firewall.log:0
> y2-46-bootloader.log:1
> y2-47-bootloader.log:1
> y2-48-bootloader.log:1
Comment 5 Stefan Hundhammer 2023-09-04 14:16:00 UTC
This looks more like a graphics driver problem to me than a YaST problem. "raphael_mendocino" (from Xorg.0.log; see comment #3) appears to be an AMD Ryzen 7000 "Raphael" GPU.
Comment 6 Stefan Hundhammer 2023-09-04 14:40:01 UTC
In the 'dmesg' file of the attached y2logs tarball, I see this backtrace:

> [    4.022103] Hardware name: Micro-Star International Co., Ltd. MS-7E28/PRO A620M-E (MS-7E28), BIOS 1.45 08/02/2023
> [    4.022104] RIP: 0010:dp_retrieve_lttpr_cap+0x165/0x190 [amdgpu]
> [    4.022263] Code: 04 25 28 00 00 00 75 45 48 83 c4 10 89 d8 5b 5d e9 40 2e 66 f5 48 c7 c2 10 24 25 c1 be 02 00 00 00 31 ff e8 1d d3 25 f5 eb b7 <0f> 0b c6 85 8c 02 00 00 80 b9 80 00 00 00 48 c7 c2 58 b8 22 c1 31
> [    4.022264] RSP: 0018:ffffb62140537698 EFLAGS: 00010246
> [    4.022265] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00ffffffffffffff
> [    4.022266] RDX: 0000000000000007 RSI: ffffb62140537698 RDI: 0000000000000000
> [    4.022267] RBP: ffff900c8ae97000 R08: 0000000000000008 R09: 00000000000f0000
> [    4.022268] R10: 0000000000000002 R11: 0000000000000228 R12: 0000000000000001
> [    4.022269] R13: 0000000000000001 R14: ffff900c8ae94000 R15: ffff900c8ee87d80
> [    4.022270] FS:  00007fede24ddd40(0000) GS:ffff901397d80000(0000) knlGS:0000000000000000
> [    4.022271] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.022272] CR2: 00007ffc2227d878 CR3: 000000080cc5e000 CR4: 0000000000750ee0
> [    4.022273] PKRU: 55555554
> [    4.022273] Call Trace:
> [    4.022276]  <TASK>
> [    4.022276]  ? dp_retrieve_lttpr_cap+0x165/0x190 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.022418]  ? __warn+0x81/0x130
> [    4.022422]  ? dp_retrieve_lttpr_cap+0x165/0x190 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.022557]  ? report_bug+0x171/0x1a0
> [    4.022560]  ? handle_bug+0x3c/0x80
> [    4.022563]  ? exc_invalid_op+0x17/0x70
> [    4.022564]  ? asm_exc_invalid_op+0x1a/0x20
> [    4.022568]  ? dp_retrieve_lttpr_cap+0x165/0x190 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.022703]  ? dp_retrieve_lttpr_cap+0x10a/0x190 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.022834]  retrieve_link_cap+0x5c/0xa40 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.022963]  ? show_bios_limit+0x40/0x90
> [    4.022966]  ? dp_is_sink_present+0xbc/0x120 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.023099]  detect_link_and_local_sink+0xa6e/0xed0 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.023246]  ? __x86_return_thunk+0x9/0x10
> [    4.023248]  ? dm_write_reg_func+0x22/0x80 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.023417]  ? __x86_return_thunk+0x9/0x10
> [    4.023418]  ? dm_write_reg_func+0x22/0x80 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.023558]  ? __x86_return_thunk+0x9/0x10
> [    4.023560]  ? generic_reg_update_ex+0xb2/0x200 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.023732]  ? __x86_return_thunk+0x9/0x10
> [    4.023734]  link_detect+0x3a/0x470 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.023887]  ? dal_gpio_destroy_irq+0x25/0x40 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.024036]  ? __x86_return_thunk+0x9/0x10
> [    4.024038]  ? query_hpd_status+0x6e/0xa0 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.024176]  amdgpu_dm_init.isra.0+0xf36/0x1e00 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.024338]  ? __pfx_enable_assr+0x10/0x10 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.024480]  ? __pfx_update_config+0x10/0x10 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.024614]  dm_hw_init+0x12/0x30 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.024747]  amdgpu_device_init+0x1cce/0x22b0 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.024875]  ? __x86_return_thunk+0x9/0x10
> [    4.024877]  ? __x86_return_thunk+0x9/0x10
> [    4.024878]  ? pci_bus_read_config_word+0x4a/0x90
> [    4.024881]  amdgpu_driver_load_kms+0x19/0x190 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.025007]  amdgpu_pci_probe+0x141/0x420 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.025130]  local_pci_probe+0x42/0xa0
> [    4.025133]  pci_device_probe+0xc7/0x230
> [    4.025136]  really_probe+0x19b/0x3e0
> [    4.025139]  ? __pfx___driver_attach+0x10/0x10
> [    4.025140]  __driver_probe_device+0x78/0x160
> [    4.025142]  driver_probe_device+0x1f/0x90
> [    4.025143]  __driver_attach+0xd2/0x1c0
> [    4.025144]  bus_for_each_dev+0x74/0xc0
> [    4.025148]  bus_add_driver+0x116/0x220
> [    4.025150]  driver_register+0x59/0x100
> [    4.025152]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu 88133889359c296cc6f50e549ff667d16c487301]
> [    4.025271]  do_one_initcall+0x47/0x220
> [    4.025274]  ? __x86_return_thunk+0x9/0x10
> [    4.025275]  ? kmalloc_trace+0x2a/0xa0
> [    4.025278]  do_init_module+0x60/0x240
> [    4.025282]  __do_sys_init_module+0x17f/0x1b0
> [    4.025283]  ? __seccomp_filter+0x31b/0x4e0
> [    4.025287]  do_syscall_64+0x5d/0x90
> [    4.025289]  ? do_syscall_64+0x6c/0x90
> [    4.025291]  ? __x86_return_thunk+0x9/0x10
> [    4.025292]  ? syscall_exit_to_user_mode+0x1b/0x40
> [    4.025293]  ? __x86_return_thunk+0x9/0x10
> [    4.025294]  ? do_syscall_64+0x6c/0x90
> [    4.025296]  ? __x86_return_thunk+0x9/0x10
> [    4.025297]  ? syscall_exit_to_user_mode+0x1b/0x40
> [    4.025298]  ? __x86_return_thunk+0x9/0x10
> [    4.025299]  ? do_syscall_64+0x6c/0x90
> [    4.025301]  entry_SYSCALL_64_after_hwframe+0x77/0xe1
> [    4.025303] RIP: 0033:0x7fede3045d0e
> [    4.025311] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ea 10 13 00 f7 d8 64 89 01 48
> [    4.025312] RSP: 002b:00007ffc22296258 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> [    4.025314] RAX: ffffffffffffffda RBX: 0000555c19014e90 RCX: 00007fede3045d0e
> [    4.025314] RDX: 00007fede31de061 RSI: 00000000016fa71b RDI: 00007feddec1d010
> [    4.025315] RBP: 00007fede31de061 R08: 0000000000261000 R09: 0000000000000000
> [    4.025316] R10: 0000000000013a51 R11: 0000000000000246 R12: 0000000000020000
> [    4.025317] R13: 0000000000000000 R14: 0000555c19002b40 R15: 0000000000000000
> [    4.025319]  </TASK>
Comment 7 Stefan Dirsch 2023-09-04 14:46:30 UTC
Yes, this looks like an amgpu kernel module issue to me. Reassigning.
Comment 8 Takashi Iwai 2023-09-04 15:04:17 UTC
It's just a kernel warning at the boot time, and can be a red herring, not necessarily the cause of the hang.  Actually the kernel proceeded to boot and the initialization of AMDGPU finished after the warning.  Remember that the system boots up, and a problem happens later on the GUI.

And, the log doesn't show any sign of kernel crash afterwards.
So, it smells more like a crash in user-space stuff.

Let's gather the crash info for X at first.
Comment 9 Stefan Dirsch 2023-09-04 20:16:31 UTC
Hmm. I don't see X crashing. I would need Xorg.0.log.old after reproducing the issue and reboot. Either you find it in /var/log/Xorg.0.log or ~/.local/share/xorg
Comment 10 Stefan Dirsch 2023-09-04 20:28:00 UTC
(In reply to Stefan Dirsch from comment #9)
> Hmm. I don't see X crashing. I would need Xorg.0.log.old after reproducing
> the issue and reboot. Either you find it in /var/log/Xorg.0.log or

Of course I mean /var/log ...

> ~/.local/share/xorg
Comment 11 Christian Tallner 2023-09-05 15:59:22 UTC
Created attachment 869295 [details]
Xorg Log, after reproduced error and rebooted
Comment 12 Stefan Dirsch 2023-09-05 21:55:37 UTC
(In reply to Christian Tallner from comment #11)
> Created attachment 869295 [details]
> Xorg Log, after reproduced error and rebooted

Hmm. I don't see any crash in this Xorg.0.log.old. :-(
Comment 13 Stefan Dirsch 2024-03-29 01:26:27 UTC
I still think it's a kernel issue. So I suggest to boot with kernel boot parama

  modprobe.blacklist=amdgpu

so things come up with simpledrm.With that an installation should be possible. Christian, could you give this a try? But I'm afraid you gave up about Tumbleweed a long time ago ...
Comment 14 Stefan Dirsch 2024-05-06 23:20:23 UTC
Yeah. Seems so. Closing ...