Bug 1218935 - [Build V.95.1] Review mutter-SLE-bsc984738-grab-display.patch, which causes xterm failed to launch on GNOME45.
Summary: [Build V.95.1] Review mutter-SLE-bsc984738-grab-display.patch, which causes x...
Status: RESOLVED FIXED
Alias: None
Product: PUBLIC SUSE Linux Enterprise Server 15 SP6
Classification: openSUSE
Component: GNOME (show other bugs)
Version: unspecified
Hardware: Other Other
: P2 - High : Normal
Target Milestone: ---
Assignee: Alynx Zhou
QA Contact:
URL: https://openqa.suse.de/tests/13276212...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-18 07:26 UTC by Grace Wang
Modified: 2024-06-26 21:10 UTC (History)
10 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Grace Wang 2024-01-18 07:26:37 UTC
Steps:
1. Install SLES + WE and the system role is "SLES with GNOME". (See https://openqa.suse.de/tests/13276212#step/accept_selected_role_SLES_with_GNOME/1)
2. Boot to desktop -> Press Alt-F2 -> Input "xterm" and then pres "Enter"

Actual Result:
The xterm can be launched on GNOME45.


## Observation

openQA test in scenario sle-15-SP6-Online-V-Staging-x86_64-gnome@64bit fails in
[xterm](https://openqa.suse.de/tests/13276212/modules/xterm/steps/6)

## Test suite description
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Maintainer: slindomansilla

The standard scenario where we mainly just follow installation suggestions without any adjustments as long as the default desktop is gnome.


## Reproducible

Fails since (at least) Build [V.80.2](https://openqa.suse.de/tests/13172329)


## Expected result

Last good: (unknown) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online-V-Staging&machine=64bit&test=gnome&version=15-SP6)
Comment 1 Alynx Zhou 2024-01-18 07:54:02 UTC
Not sure if it is related, but the journal says gsd-usb-protect is always crashing...
Comment 2 Grace Wang 2024-01-18 08:16:04 UTC
Below are the detailed installation steps:

Insert a DVD or a bootable USB stick containing the installation image for the SLE-15-SP6-Online-x86_64-BuildV.95.1-Media1.iso 
(can be found from https://openqa.suse.de/tests/13276234/asset/iso/SLE-15-SP6-Online-x86_64-BuildV.95.1-Media1.iso), then reboot the computer to start the installation program. 

Select "SUSE Linux Enterprise Server 15SP6" as the product to install. (You can refer to this screenshot https://openqa.suse.de/tests/13276234#step/install_SLES/1)

Register via scc, the SCC_URL is http://all-V.95.1.proxy.scc.suse.de

On the "Extension and Module selection" screen, check "Desktop Application Module 15SP6 x86_64" based on the default selection. (You can refer to https://openqa.suse.de/tests/13276234#step/register_module_desktop/1)

Follow the installation process.

Skip install addons (You can refer to https://openqa.suse.de/tests/13276234#step/skip_install_addons/1)

Select "SLES with GNOME" as the system role (You can refer to https://openqa.suse.de/tests/13276234#step/accept_selected_role_SLES_with_GNOME/1)

Then follow the installation process with suggested settings.
Please don't forget to create user for your system.
Here is an overview of the installation settings.
https://openqa.suse.de/tests/13276234#step/launch_installation/1
Comment 3 Yifan Jiang 2024-01-18 08:19:22 UTC
(In reply to Alynx Zhou from comment #1)
> Not sure if it is related, but the journal says gsd-usb-protect is always
> crashing...

Alynx, can you help on this in details please? I'd like to see what the exact problem blocks xterm to launch. Please follow the steps in the comment#1 to get an identical system locally (if needed).
Comment 4 Alynx Zhou 2024-01-18 09:30:31 UTC
(In reply to Yifan Jiang from comment #3)
> (In reply to Alynx Zhou from comment #1)
> > Not sure if it is related, but the journal says gsd-usb-protect is always
> > crashing...
> 
> Alynx, can you help on this in details please? I'd like to see what the
> exact problem blocks xterm to launch. Please follow the steps in the
> comment#1 to get an identical system locally (if needed).

OK, will try tomorrow.
Comment 5 Eugenio Paolantonio 2024-01-19 01:02:26 UTC
Hi, latest build for Staging V is now 97.1 (still has the same problem).

Unfortunately tests (and :GA) were a bit degraded in the past weeks so I can't say for certain if this is due to the GNOME update or unrelated stuff.

Please let me know if I can help somehow. Thanks!
Comment 6 Grace Wang 2024-01-19 06:56:14 UTC
Yeah, with latest Staging build 97.1, there are still some tests failed.
https://openqa.suse.de/tests/13285242#

Among the failures, xterm failed due to this bug.
The failures of keymap_or_locale_x11, sshxterm, gedit and glxgears are also caused by this bug given the current situation (not sure if there will be new issues after the xterm issue is fixed)

The failure of shutdown (https://openqa.suse.de/tests/13285242#step/shutdown/9) shoud be fixed by adding new needles. So, we can ignore.

We have many test cases rely on or need xterm to do some settings in openQA.
So, if this issue can't be fixed before the next snapshot, we might face tons of failures on https://openqa.suse.de
Comment 7 Alynx Zhou 2024-01-19 07:12:25 UTC
I've tried setup a VM locally today with the same iso on openqa. And I cannot reproduce this bug.
Comment 9 Grace Wang 2024-01-19 08:04:03 UTC
Sure, I am preparing local env, will try and see if this can be reproduced with Build V.97.1 on my side. Will update you later.
Comment 10 Yifan Jiang 2024-01-19 09:42:22 UTC
(In reply to Grace Wang from comment #6)
> Yeah, with latest Staging build 97.1, there are still some tests failed.
> https://openqa.suse.de/tests/13285242#

Interesting, the journal shows xterm is launched by gnome-shell (Alt-F2):

> Jan 18 19:41:48.512196 susetest systemd[2469]: Started Application launched by gnome-shell.

and the xterm process is actually there through: https://openqa.suse.de/tests/13285242/logfile?filename=xterm-basic_health_check.txt

> bernhard 23459  2651  0.0  1.2  55608 12712 S    00:00:00 xterm
Comment 11 Eugenio Paolantonio 2024-01-19 17:34:26 UTC
Staging V has been merged. V has been freezed again so GNOME 45 will still be there for testing, will check if it happens on the GA build as well once we have one.
Comment 12 Grace Wang 2024-01-22 01:57:08 UTC
(In reply to Alynx Zhou from comment #7)
> I've tried setup a VM locally today with the same iso on openqa. And I
> cannot reproduce this bug.

I tried several times on my local env (BuildV97.1), this issue can be reproduced everytime. The xterm can't be launched and the system also hangs there without any respose.

@Alynx, can you please let me know if you follow the same installation steps with what I described in comment #2? Or you can find me on slack.
Comment 13 Alynx Zhou 2024-01-23 01:12:40 UTC
(In reply to Grace Wang from comment #12)
> (In reply to Alynx Zhou from comment #7)
> @Alynx, can you please let me know if you follow the same installation steps
> with what I described in comment #2? Or you can find me on slack.

I didn't find where I could set the SCC url, I think this is the problem?
Comment 14 Grace Wang 2024-01-23 01:28:12 UTC
(In reply to Alynx Zhou from comment #13)
> (In reply to Grace Wang from comment #12)
> > (In reply to Alynx Zhou from comment #7)
> > @Alynx, can you please let me know if you follow the same installation steps
> > with what I described in comment #2? Or you can find me on slack.
> 
> I didn't find where I could set the SCC url, I think this is the problem?

You can specify the SCC url in the Boot Options just like in this step: https://openqa.suse.de/tests/13289913#step/bootloader_start/6
Comment 15 Alynx Zhou 2024-01-23 04:24:47 UTC
After setting the SCC URL, I could reproduce this bug.
Comment 16 Alynx Zhou 2024-01-24 07:11:43 UTC
It looks not only xterm, all Server Side Decorated window will make desktop hang. I've tried set MUTTER_DEBUG and MUTTER_VERBOSE, but didn't find anything wrong.
Comment 17 Alynx Zhou 2024-01-24 08:48:51 UTC
Tried to build and install mutter and gnome-shell 45.3 on 15SP6, this bug still exists, I am wondering it is a X problem, maybe we need to update x server in 15SP6?
Comment 18 Alynx Zhou 2024-01-24 08:51:23 UTC
I also get some other problems on my VM. For example if I try to restart the VM, sometimes desktop does not show up (only show the default X x cursor), or GDM is fine but gnome session failed to start. @xiaoguang do you have the same problem?
Comment 19 Eugenio Paolantonio 2024-01-24 08:56:35 UTC
Same bug happens to me on the GA build, and as Alynx said, not only xterm is affected - easily reproducible with yast2 for example.

And indeed sometimes the session won't start at all - I'll try staging xorg from Factory in Staging:V, let's see...
Comment 20 Alynx Zhou 2024-01-24 09:13:37 UTC
Tried to update xorg-x11-server to 21.1.11, still not fixed this.
Comment 21 Alynx Zhou 2024-01-24 09:13:53 UTC
GNOME Shell says following:

Jan 24 17:10:23 bogon gnome-shell[3005]: Missing required core component Settings, expect trouble…


Is this expected?
Comment 22 Jia Zhaocong 2024-01-24 09:46:30 UTC
CCing X11 maintainer.

This is my observation: the system freezes after launching xterm (under X11 Gnome).

The system does not freeze under wayland Gnome.

Other than xterm, vncviewer, pidgin (which uses gtk2 which uses x11) also doesn't show after launching.

(BTW, you can try this qcow2 image:
https://openqa.suse.de/tests/13328085/asset/hdd/SLE-15-SP6-x86_64-Build47.2-sled-gnome.qcow2)
Comment 23 Jia Zhaocong 2024-01-24 10:28:37 UTC
Even basic commands like "xlsclients" hangs,

"ltrace" shows hang at "xcb_connect".

"gdb" shows hang at "xcb_connect_to_fd" <- "xcb_connect_to_display_with_auth_info".
Comment 24 Eugenio Paolantonio 2024-01-24 11:11:05 UTC
BTW IceWM works as expected for me.

I've tried downgrading to gnome-shell + mutter 44 and the xterm works: https://openqa.suse.de/tests/13329734

Note that this is without SLE-specific patches as they weren't rebased on 44.


I will now try again with 45 without SLE patches, then maybe trying to bisect changes from 44 -> 45.
Comment 25 Stefan Dirsch 2024-01-24 11:30:33 UTC
(In reply to Jia Zhaocong from comment #22)
> CCing X11 maintainer.
> 
> This is my observation: the system freezes after launching xterm (under X11
> Gnome).
> 
> The system does not freeze under wayland Gnome.
> 
> Other than xterm, vncviewer, pidgin (which uses gtk2 which uses x11) also
> doesn't show after launching.
> 
> (BTW, you can try this qcow2 image:
> https://openqa.suse.de/tests/13328085/asset/hdd/SLE-15-SP6-x86_64-Build47.2-
> sled-gnome.qcow2)

Could you please tell me the root password for this system?
Comment 26 Stefan Dirsch 2024-01-24 11:36:48 UTC
(In reply to Jia Zhaocong from comment #23)
> Even basic commands like "xlsclients" hangs,
> 
> "ltrace" shows hang at "xcb_connect".
> 
> "gdb" shows hang at "xcb_connect_to_fd" <-
> "xcb_connect_to_display_with_auth_info".

Could be that we also need a libxcb update for SP6. Not only the libX11 update. 

https://build.suse.de/project/monitor/home:sndirsch:X11:XOrg:sle15_sp6_testing

Replacing all installed xcb and libX11 RPMs would be an option.
Comment 27 Jia Zhaocong 2024-01-24 11:43:46 UTC
(In reply to Stefan Dirsch from comment #25)
> (In reply to Jia Zhaocong from comment #22)
> > CCing X11 maintainer.
> > 
> > This is my observation: the system freezes after launching xterm (under X11
> > Gnome).
> > 
> > The system does not freeze under wayland Gnome.
> > 
> > Other than xterm, vncviewer, pidgin (which uses gtk2 which uses x11) also
> > doesn't show after launching.
> > 
> > (BTW, you can try this qcow2 image:
> > https://openqa.suse.de/tests/13328085/asset/hdd/SLE-15-SP6-x86_64-Build47.2-
> > sled-gnome.qcow2)
> 
> Could you please tell me the root password for this system?

Told you via Slack.

Thanks for looking into this problem so quickly.
Comment 28 Eugenio Paolantonio 2024-01-24 12:31:20 UTC
> Could be that we also need a libxcb update for SP6. Not only the libX11 update. 
>
> https://build.suse.de/project/monitor/home:sndirsch:X11:XOrg:sle15_sp6_testing
>
> Replacing all installed xcb and libX11 RPMs would be an option.

I've tried with xcb and xcb-proto via Factory yesterday, but no change (not sure if the ones from your repo are different)

Follow-up on my test: mutter 45 + gnome-shell 45 without SLE-specific patches make xterm work: https://openqa.suse.de/tests/13329989#

Going to partially re-enable them and check again.
Comment 29 Stefan Dirsch 2024-01-24 13:07:20 UTC
I tried with libX11 and libxcb package update from my repo built against SP6 and I still see the same issue. Things are working with naked Xserver and running icewm on-top.
Comment 30 Eugenio Paolantonio 2024-01-24 13:47:28 UTC
I think the following patch is what makes mutter misbehave:


# PATCH-FIX-SLE mutter-SLE-bsc984738-grab-display.patch bsc#984738 bgo#769387 -- Revert a upstream commit to avoid X11 race condition that results in wrong dialog sizes.
Patch1002:      mutter-SLE-bsc984738-grab-display.patch


My own repo with this patch disabled, at least on my local VM it works:

https://download.suse.de/ibs/home:/epaolantonio:/branches:/SUSE:/SLE-15-SP6:/GA/standard/home:epaolantonio:branches:SUSE:SLE-15-SP6:GA.repo

Direct link to the rpm: https://download.suse.de/ibs/home:/epaolantonio:/branches:/SUSE:/SLE-15-SP6:/GA/standard/x86_64/mutter-45.1-150600.6.1.x86_64.rpm


I'm doing a rebuild in Staging:V with this patch disabled, and will trigger openQA tests again.
Comment 31 Stefan Dirsch 2024-01-24 14:25:35 UTC
I can confirm replacing mutter with your new version without that patch fixes the issue. I've tested the system with updated libxcb/libX11 packages though.
Comment 33 Eugenio Paolantonio 2024-01-24 17:36:49 UTC
Thanks for testing! It took a bit, but I also run :V again with the patch disabled, and  looks all green: https://openqa.suse.de/tests/13332422

I'm keeping this bug open for feedback from the desktop team. Is the patch still needed and if so, could you please take a look?

Thanks in advance!
Comment 34 Alynx Zhou 2024-01-25 02:03:03 UTC
It seems that patch is added for a fairly old bug, and grab the x server is not suggested by upstream...

Trying CC the original patch submitter for opinion.
Comment 35 Yifan Jiang 2024-01-26 07:48:38 UTC
Thanks Eugenio, Stefan and everyone! Update the title for tracing further - we'll need to decide how to deal with this patch (remove or re-write based on current window decoration infrastructure).
Comment 36 Alynx Zhou 2024-01-26 09:58:27 UTC
I've got some ideas about the reason. In https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2175, the x11 server side decoration (frames) are moved into a independent process called mutter-x11-frames. And the possible reason is in that patch it grabs X server during manage of x11 window, so this blocks the connection between mutter-x11-frames and x server, so it cannot create frame for xterm, I'll try to solve this.
Comment 37 Alynx Zhou 2024-01-26 11:40:06 UTC
Some detailed idea on what happened mutter-SLE-bsc984738-grab-display.patch:

This patch tries to grab X server when managing window, so other programs cannot interactive with X server via their own connection, and for server side decorated window, it is actually 2 X windows, one for the program's X window and another for the frame's X window.

Before GNOME 45 (or maybe 44), the frame is drawn inside mutter's process, so grab X server won't block it. But now the frame is drawn by mutter-x11-frames' process, so if you grab X server in mutter, mutter-x11-frames cannot iterate X windows and create X windows from frames.

So when mutter gets a new X window of program, it will grab X server because of that patch, and wait for a frame X window from mutter-x11-frames during managing window, and then ungrab X server, and because X server is grabbed, mutter-x11-frames cannot iterate X windows, also cannot create frame X window until X server is ungrabbed. Then they will wait forever...
Comment 38 Alynx Zhou 2024-01-26 11:43:22 UTC
I am not sure how mutter-SLE-bsc984738-grab-display.patch fixes the dialog size bug in https://bugzilla.suse.com/show_bug.cgi?id=984738 by grab the X server and delayed some keybindings/keys grabbing. I checked log Hans Petter Jansson provided in https://bugzilla.gnome.org/show_bug.cgi?id=769387#c1, and I am thinking it is not a race condition, because in the bad dialog section, the oracle installed is clearly providing height of 1 in max/min size, I think it is providing wrong size... I'll do some test once I get a oracle installer binary...
Comment 39 Alynx Zhou 2024-01-26 11:48:19 UTC
(In reply to Alynx Zhou from comment #38)
> the oracle installed is clearly providing height of 1 in max/min
> size

Sorry, it's providing 0, and mutter enlarge it to 1 because 0 is not a valid value.
Comment 40 Alynx Zhou 2024-01-31 16:03:38 UTC
With help from Arun, I got the Oracle Installer 19c, which is the current long term support version. I tried to reproduce https://bugzilla.suse.com/show_bug.cgi?id=984738 with mutter-SLE-bsc984738-grab-display.patch disabled, and I cannot reproduce it. So it might be an issue of old Java/Oracle Installer, and it should be safe to drop that patch.

I'll try more tomorrow to see whether it is fixed already.
Comment 42 Alynx Zhou 2024-02-04 01:10:07 UTC
https://build.suse.de/request/show/320389
Comment 43 Stefan Dirsch 2024-02-09 14:03:55 UTC
Can't it be closed as fixed now?
Comment 45 Radoslav Tzvetkov 2024-02-14 15:21:48 UTC
A submit request mentioning this bug was successfully integrated into the Beta3-202401. 
Please resolve the bug IF you consider it fixed, so that it can be eventually verified.
Comment 46 Alynx Zhou 2024-02-18 06:39:38 UTC
Delayed because Spring Festival holiday, closed.