Bugzilla – Bug 1218935
[Build V.95.1] Review mutter-SLE-bsc984738-grab-display.patch, which causes xterm failed to launch on GNOME45.
Last modified: 2024-06-26 21:10:20 UTC
Steps: 1. Install SLES + WE and the system role is "SLES with GNOME". (See https://openqa.suse.de/tests/13276212#step/accept_selected_role_SLES_with_GNOME/1) 2. Boot to desktop -> Press Alt-F2 -> Input "xterm" and then pres "Enter" Actual Result: The xterm can be launched on GNOME45. ## Observation openQA test in scenario sle-15-SP6-Online-V-Staging-x86_64-gnome@64bit fails in [xterm](https://openqa.suse.de/tests/13276212/modules/xterm/steps/6) ## Test suite description Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Maintainer: slindomansilla The standard scenario where we mainly just follow installation suggestions without any adjustments as long as the default desktop is gnome. ## Reproducible Fails since (at least) Build [V.80.2](https://openqa.suse.de/tests/13172329) ## Expected result Last good: (unknown) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online-V-Staging&machine=64bit&test=gnome&version=15-SP6)
Not sure if it is related, but the journal says gsd-usb-protect is always crashing...
Below are the detailed installation steps: Insert a DVD or a bootable USB stick containing the installation image for the SLE-15-SP6-Online-x86_64-BuildV.95.1-Media1.iso (can be found from https://openqa.suse.de/tests/13276234/asset/iso/SLE-15-SP6-Online-x86_64-BuildV.95.1-Media1.iso), then reboot the computer to start the installation program. Select "SUSE Linux Enterprise Server 15SP6" as the product to install. (You can refer to this screenshot https://openqa.suse.de/tests/13276234#step/install_SLES/1) Register via scc, the SCC_URL is http://all-V.95.1.proxy.scc.suse.de On the "Extension and Module selection" screen, check "Desktop Application Module 15SP6 x86_64" based on the default selection. (You can refer to https://openqa.suse.de/tests/13276234#step/register_module_desktop/1) Follow the installation process. Skip install addons (You can refer to https://openqa.suse.de/tests/13276234#step/skip_install_addons/1) Select "SLES with GNOME" as the system role (You can refer to https://openqa.suse.de/tests/13276234#step/accept_selected_role_SLES_with_GNOME/1) Then follow the installation process with suggested settings. Please don't forget to create user for your system. Here is an overview of the installation settings. https://openqa.suse.de/tests/13276234#step/launch_installation/1
(In reply to Alynx Zhou from comment #1) > Not sure if it is related, but the journal says gsd-usb-protect is always > crashing... Alynx, can you help on this in details please? I'd like to see what the exact problem blocks xterm to launch. Please follow the steps in the comment#1 to get an identical system locally (if needed).
(In reply to Yifan Jiang from comment #3) > (In reply to Alynx Zhou from comment #1) > > Not sure if it is related, but the journal says gsd-usb-protect is always > > crashing... > > Alynx, can you help on this in details please? I'd like to see what the > exact problem blocks xterm to launch. Please follow the steps in the > comment#1 to get an identical system locally (if needed). OK, will try tomorrow.
Hi, latest build for Staging V is now 97.1 (still has the same problem). Unfortunately tests (and :GA) were a bit degraded in the past weeks so I can't say for certain if this is due to the GNOME update or unrelated stuff. Please let me know if I can help somehow. Thanks!
Yeah, with latest Staging build 97.1, there are still some tests failed. https://openqa.suse.de/tests/13285242# Among the failures, xterm failed due to this bug. The failures of keymap_or_locale_x11, sshxterm, gedit and glxgears are also caused by this bug given the current situation (not sure if there will be new issues after the xterm issue is fixed) The failure of shutdown (https://openqa.suse.de/tests/13285242#step/shutdown/9) shoud be fixed by adding new needles. So, we can ignore. We have many test cases rely on or need xterm to do some settings in openQA. So, if this issue can't be fixed before the next snapshot, we might face tons of failures on https://openqa.suse.de
I've tried setup a VM locally today with the same iso on openqa. And I cannot reproduce this bug.
Sure, I am preparing local env, will try and see if this can be reproduced with Build V.97.1 on my side. Will update you later.
(In reply to Grace Wang from comment #6) > Yeah, with latest Staging build 97.1, there are still some tests failed. > https://openqa.suse.de/tests/13285242# Interesting, the journal shows xterm is launched by gnome-shell (Alt-F2): > Jan 18 19:41:48.512196 susetest systemd[2469]: Started Application launched by gnome-shell. and the xterm process is actually there through: https://openqa.suse.de/tests/13285242/logfile?filename=xterm-basic_health_check.txt > bernhard 23459 2651 0.0 1.2 55608 12712 S 00:00:00 xterm
Staging V has been merged. V has been freezed again so GNOME 45 will still be there for testing, will check if it happens on the GA build as well once we have one.
(In reply to Alynx Zhou from comment #7) > I've tried setup a VM locally today with the same iso on openqa. And I > cannot reproduce this bug. I tried several times on my local env (BuildV97.1), this issue can be reproduced everytime. The xterm can't be launched and the system also hangs there without any respose. @Alynx, can you please let me know if you follow the same installation steps with what I described in comment #2? Or you can find me on slack.
(In reply to Grace Wang from comment #12) > (In reply to Alynx Zhou from comment #7) > @Alynx, can you please let me know if you follow the same installation steps > with what I described in comment #2? Or you can find me on slack. I didn't find where I could set the SCC url, I think this is the problem?
(In reply to Alynx Zhou from comment #13) > (In reply to Grace Wang from comment #12) > > (In reply to Alynx Zhou from comment #7) > > @Alynx, can you please let me know if you follow the same installation steps > > with what I described in comment #2? Or you can find me on slack. > > I didn't find where I could set the SCC url, I think this is the problem? You can specify the SCC url in the Boot Options just like in this step: https://openqa.suse.de/tests/13289913#step/bootloader_start/6
After setting the SCC URL, I could reproduce this bug.
It looks not only xterm, all Server Side Decorated window will make desktop hang. I've tried set MUTTER_DEBUG and MUTTER_VERBOSE, but didn't find anything wrong.
Tried to build and install mutter and gnome-shell 45.3 on 15SP6, this bug still exists, I am wondering it is a X problem, maybe we need to update x server in 15SP6?
I also get some other problems on my VM. For example if I try to restart the VM, sometimes desktop does not show up (only show the default X x cursor), or GDM is fine but gnome session failed to start. @xiaoguang do you have the same problem?
Same bug happens to me on the GA build, and as Alynx said, not only xterm is affected - easily reproducible with yast2 for example. And indeed sometimes the session won't start at all - I'll try staging xorg from Factory in Staging:V, let's see...
Tried to update xorg-x11-server to 21.1.11, still not fixed this.
GNOME Shell says following: Jan 24 17:10:23 bogon gnome-shell[3005]: Missing required core component Settings, expect trouble… Is this expected?
CCing X11 maintainer. This is my observation: the system freezes after launching xterm (under X11 Gnome). The system does not freeze under wayland Gnome. Other than xterm, vncviewer, pidgin (which uses gtk2 which uses x11) also doesn't show after launching. (BTW, you can try this qcow2 image: https://openqa.suse.de/tests/13328085/asset/hdd/SLE-15-SP6-x86_64-Build47.2-sled-gnome.qcow2)
Even basic commands like "xlsclients" hangs, "ltrace" shows hang at "xcb_connect". "gdb" shows hang at "xcb_connect_to_fd" <- "xcb_connect_to_display_with_auth_info".
BTW IceWM works as expected for me. I've tried downgrading to gnome-shell + mutter 44 and the xterm works: https://openqa.suse.de/tests/13329734 Note that this is without SLE-specific patches as they weren't rebased on 44. I will now try again with 45 without SLE patches, then maybe trying to bisect changes from 44 -> 45.
(In reply to Jia Zhaocong from comment #22) > CCing X11 maintainer. > > This is my observation: the system freezes after launching xterm (under X11 > Gnome). > > The system does not freeze under wayland Gnome. > > Other than xterm, vncviewer, pidgin (which uses gtk2 which uses x11) also > doesn't show after launching. > > (BTW, you can try this qcow2 image: > https://openqa.suse.de/tests/13328085/asset/hdd/SLE-15-SP6-x86_64-Build47.2- > sled-gnome.qcow2) Could you please tell me the root password for this system?
(In reply to Jia Zhaocong from comment #23) > Even basic commands like "xlsclients" hangs, > > "ltrace" shows hang at "xcb_connect". > > "gdb" shows hang at "xcb_connect_to_fd" <- > "xcb_connect_to_display_with_auth_info". Could be that we also need a libxcb update for SP6. Not only the libX11 update. https://build.suse.de/project/monitor/home:sndirsch:X11:XOrg:sle15_sp6_testing Replacing all installed xcb and libX11 RPMs would be an option.
(In reply to Stefan Dirsch from comment #25) > (In reply to Jia Zhaocong from comment #22) > > CCing X11 maintainer. > > > > This is my observation: the system freezes after launching xterm (under X11 > > Gnome). > > > > The system does not freeze under wayland Gnome. > > > > Other than xterm, vncviewer, pidgin (which uses gtk2 which uses x11) also > > doesn't show after launching. > > > > (BTW, you can try this qcow2 image: > > https://openqa.suse.de/tests/13328085/asset/hdd/SLE-15-SP6-x86_64-Build47.2- > > sled-gnome.qcow2) > > Could you please tell me the root password for this system? Told you via Slack. Thanks for looking into this problem so quickly.
> Could be that we also need a libxcb update for SP6. Not only the libX11 update. > > https://build.suse.de/project/monitor/home:sndirsch:X11:XOrg:sle15_sp6_testing > > Replacing all installed xcb and libX11 RPMs would be an option. I've tried with xcb and xcb-proto via Factory yesterday, but no change (not sure if the ones from your repo are different) Follow-up on my test: mutter 45 + gnome-shell 45 without SLE-specific patches make xterm work: https://openqa.suse.de/tests/13329989# Going to partially re-enable them and check again.
I tried with libX11 and libxcb package update from my repo built against SP6 and I still see the same issue. Things are working with naked Xserver and running icewm on-top.
I think the following patch is what makes mutter misbehave: # PATCH-FIX-SLE mutter-SLE-bsc984738-grab-display.patch bsc#984738 bgo#769387 -- Revert a upstream commit to avoid X11 race condition that results in wrong dialog sizes. Patch1002: mutter-SLE-bsc984738-grab-display.patch My own repo with this patch disabled, at least on my local VM it works: https://download.suse.de/ibs/home:/epaolantonio:/branches:/SUSE:/SLE-15-SP6:/GA/standard/home:epaolantonio:branches:SUSE:SLE-15-SP6:GA.repo Direct link to the rpm: https://download.suse.de/ibs/home:/epaolantonio:/branches:/SUSE:/SLE-15-SP6:/GA/standard/x86_64/mutter-45.1-150600.6.1.x86_64.rpm I'm doing a rebuild in Staging:V with this patch disabled, and will trigger openQA tests again.
I can confirm replacing mutter with your new version without that patch fixes the issue. I've tested the system with updated libxcb/libX11 packages though.
Thanks for testing! It took a bit, but I also run :V again with the patch disabled, and looks all green: https://openqa.suse.de/tests/13332422 I'm keeping this bug open for feedback from the desktop team. Is the patch still needed and if so, could you please take a look? Thanks in advance!
It seems that patch is added for a fairly old bug, and grab the x server is not suggested by upstream... Trying CC the original patch submitter for opinion.
Thanks Eugenio, Stefan and everyone! Update the title for tracing further - we'll need to decide how to deal with this patch (remove or re-write based on current window decoration infrastructure).
I've got some ideas about the reason. In https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2175, the x11 server side decoration (frames) are moved into a independent process called mutter-x11-frames. And the possible reason is in that patch it grabs X server during manage of x11 window, so this blocks the connection between mutter-x11-frames and x server, so it cannot create frame for xterm, I'll try to solve this.
Some detailed idea on what happened mutter-SLE-bsc984738-grab-display.patch: This patch tries to grab X server when managing window, so other programs cannot interactive with X server via their own connection, and for server side decorated window, it is actually 2 X windows, one for the program's X window and another for the frame's X window. Before GNOME 45 (or maybe 44), the frame is drawn inside mutter's process, so grab X server won't block it. But now the frame is drawn by mutter-x11-frames' process, so if you grab X server in mutter, mutter-x11-frames cannot iterate X windows and create X windows from frames. So when mutter gets a new X window of program, it will grab X server because of that patch, and wait for a frame X window from mutter-x11-frames during managing window, and then ungrab X server, and because X server is grabbed, mutter-x11-frames cannot iterate X windows, also cannot create frame X window until X server is ungrabbed. Then they will wait forever...
I am not sure how mutter-SLE-bsc984738-grab-display.patch fixes the dialog size bug in https://bugzilla.suse.com/show_bug.cgi?id=984738 by grab the X server and delayed some keybindings/keys grabbing. I checked log Hans Petter Jansson provided in https://bugzilla.gnome.org/show_bug.cgi?id=769387#c1, and I am thinking it is not a race condition, because in the bad dialog section, the oracle installed is clearly providing height of 1 in max/min size, I think it is providing wrong size... I'll do some test once I get a oracle installer binary...
(In reply to Alynx Zhou from comment #38) > the oracle installed is clearly providing height of 1 in max/min > size Sorry, it's providing 0, and mutter enlarge it to 1 because 0 is not a valid value.
With help from Arun, I got the Oracle Installer 19c, which is the current long term support version. I tried to reproduce https://bugzilla.suse.com/show_bug.cgi?id=984738 with mutter-SLE-bsc984738-grab-display.patch disabled, and I cannot reproduce it. So it might be an issue of old Java/Oracle Installer, and it should be safe to drop that patch. I'll try more tomorrow to see whether it is fixed already.
https://build.opensuse.org/request/show/1143258 https://build.opensuse.org/request/show/1143259 https://build.opensuse.org/request/show/1143260
https://build.suse.de/request/show/320389
Can't it be closed as fixed now?
A submit request mentioning this bug was successfully integrated into the Beta3-202401. Please resolve the bug IF you consider it fixed, so that it can be eventually verified.
Delayed because Spring Festival holiday, closed.