Bug 1216197 - testX hangs at second stage
Summary: testX hangs at second stage
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: AutoYaST (show other bugs)
Version: Leap 15.5
Hardware: x86-64 Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-13 06:43 UTC by Daniel Spannbauer
Modified: 2023-12-14 16:30 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
y2start.log (8.13 KB, text/plain)
2023-10-13 08:21 UTC, Daniel Spannbauer
Details
ZIP file with a new testX binary from the draft fix (10.20 KB, application/zip)
2023-10-19 13:57 UTC, Stefan Hundhammer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Spannbauer 2023-10-13 06:43:40 UTC
On a installation with autoyast testX hangs after the reboot and starting the second stage.


  731 tty1     Ss+    0:00 /bin/bash /usr/lib/YaST2/startup/YaST2.Second-Stage
 2391 tty1     S+     0:00  \_ /bin/bash /usr/lib/YaST2/startup/YaST2.call installation continue
 2908 tty7     Ssl+   0:00      \_ Xorg -noreset -br -nolisten tcp -deferglyphs 16 vt07
 2963 tty1     S+     0:00      \_ /usr/lib/YaST2/bin/testX
 2965 tty1     Z+     0:00          \_ [testX] <defunct>

This hangs vorever untill you log in via ssh and kill the mother process (here pid 2963

Don't know it is also siginificant for Yast2 and the installation process
Comment 1 Daniel Spannbauer 2023-10-13 08:21:11 UTC
Created attachment 870142 [details]
y2start.log

y2start log with hanging textX
Comment 2 Daniel Spannbauer 2023-10-16 09:24:47 UTC
testX seems to hang, wenn no windowmanager is available.
In my opinion it should fail and exit 1. But it simply hangs.
Comment 3 Stefan Hundhammer 2023-10-16 12:48:07 UTC
Indeed it should exit. And it did that all the time; the code is unchanged since 11 years.

  https://github.com/yast/yast-x11/blob/master/src/tools/testX.c

In your case, the child process exited, but remained as a zombie process ("<defunct>", "Z+" in the 'pstree' output), i.e. the kernel is waiting for the parent process to do a 'wait' for the process to collect its exit status.

But it does call 'waitpid()'.
Comment 4 Stefan Hundhammer 2023-10-16 13:30:28 UTC
Calling code:

https://github.com/yast/yast-installation/blob/master/startup/YaST2.call#L33-L71


In the attached y2start.log, we see


Stage [call]: Check selected medium...
Stage [call]: ========================
	|-- Wished medium is: QT
	|-- Selected medium is: QT
	|-- TestX: XOpenDisplay failed
	|-- TestX: XOpenDisplay failed

I.e. 'testX' exited with 1 twice, i.e. the X server wasn't ready yet to accept a connection. In the case of a VNC installation (which this was not), the YaST2.call script only tries this once, otherwise it tries it up to 15 times:

https://github.com/yast/yast-installation/blob/master/startup/YaST2.call#L257

  wait_for_x11 15

Since we see a child process in the 'pstree' output, it must hang somewhere in 'RunWindowManager()':

https://github.com/yast/yast-x11/blob/master/src/tools/testX.c#L184-L206

Since the child process exited, it obviously could start none of 'icewm', 'fvwm2', 'mwm', 'twm'; which is not surprising since very likely none of them is installed.

So it exited with 0, and the parent process should have picked up that exit status since all it ever did was to wait with 'waitpid()' for that child process  to exit.
Comment 5 Stefan Hundhammer 2023-10-16 13:32:22 UTC
Daniel, is this reproducable, or was it a one-time problem?
Comment 6 Daniel Spannbauer 2023-10-16 13:36:17 UTC
Hi STefan,

I can reproduce it on a coreI5 machine here at my office and had the same problem on a Core i7 machine at a customer (Intel NUC NUC11PHKi7C), but the NUC is in CHina so not under my control.
Comment 7 Stefan Hundhammer 2023-10-16 14:09:24 UTC
OK, thanks.

That leaves me wondering what's wrong here.

The 'waitpid()' call with the WNOHANG flag may return prematurely if the parent process is much faster than the child process (which would be quite normal since it tries several 'exec..()' calls before exiting). That's also a bit broken, but not in a way that this should hang; to the contrary, in that case the parent process will exit immediately, the child process will be orphaned, and it will be picked up by 'init' which in such a case always does a 'wait()' call to avoid zombie processes filling up the process table.

But quite obviously the parent process hangs, and I don't see where else it could hang.
Comment 8 Daniel Spannbauer 2023-10-17 06:46:24 UTC
If you have an idea how to debug this.....I can reproduce it any time.
Comment 9 Stefan Hundhammer 2023-10-19 13:55:32 UTC
Draft fix:

  https://github.com/yast/yast-x11/pull/29
Comment 10 Stefan Hundhammer 2023-10-19 13:57:05 UTC
Created attachment 870331 [details]
ZIP file with a new testX binary from the draft fix

Unzip and move the testX binary to

  /usr/lib/YaST2/bin/testX
Comment 11 Stefan Hundhammer 2023-10-19 13:58:24 UTC
Daniel, could you test this version?

You can either build it from source from the https://github.com/yast/yast-x11/tree/huha-fix-testx branch, or simply unzip and move the attachment from comment #10; whichever you prefer.
Comment 12 Stefan Hundhammer 2023-10-19 14:02:37 UTC
Building from source:

  git clone -o upstream git@github.com:yast/yast-x11.git
  git checkout -b huha-fix-testx upstream/huha-fix-testx

  make -f Makefile.cvs
  make && sudo make install



Needed packages:

grep BuildRequires package/*.spec

BuildRequires:  autoconf
BuildRequires:  automake
BuildRequires:  gcc-c++
BuildRequires:  libtool
BuildRequires:  xorg-x11-libX11-devel
BuildRequires:  xorg-x11-libXmu-devel
BuildRequires:  yast2-devtools >= 3.1.10

i.e. 

sudo zypper in autoconf automake gcc-c++ libtool xorg-x11-libX11-devel xorg-x11-libXmu-devel yast2-devtools
Comment 13 Daniel Spannbauer 2023-10-23 08:09:19 UTC
Hello Stefan,

thanks, it seems to work.
After copying it to the system and a reboot, the Xorg server starts and the graphical yast did his work.

If you need further informations, please let me know.

Regards

Daniel
Comment 14 Stefan Hundhammer 2023-10-23 08:57:45 UTC
Okay, thanks for testing this!
Comment 15 Stefan Hundhammer 2023-10-23 13:58:02 UTC
SR / MR to SLE-15-SP5 as yast2-x11-4.5.2:

  https://build.suse.de/request/show/311275

Maintenance team, please notice:

This does not need to become an installer self-update since only an optional second stage of the installation is affected, with for most products only affects AutoYaST scenarios where a second stage is explicitly requested.
Comment 16 Stefan Hundhammer 2023-10-23 13:59:23 UTC
More detailed information in the PR:

  https://github.com/yast/yast-x11/pull/29/files

Merge PRs to SLE-15-SP6 and finally to master / Factory are on the way.
Comment 17 Stefan Hundhammer 2023-10-23 14:46:59 UTC
Merge to SLE-15-SP6 (as yast2-x11-4.6.2):

  https://github.com/yast/yast-x11/pull/31

SR to IBS SLE-15-SP6:

  https://build.suse.de/request/show/311286
Comment 21 Maintenance Automation 2023-12-14 16:30:29 UTC
SUSE-RU-2023:4860-1: An update that has one fix can now be installed.

Category: recommended (moderate)
Bug References: 1216197
Sources used:
openSUSE Leap 15.5 (src): yast2-x11-4.5.2-150500.3.5.1
Basesystem Module 15-SP5 (src): yast2-x11-4.5.2-150500.3.5.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.