Bugzilla – Bug 1220723
[Build 59.2] openQA test fails in first_boot - display manager can not be shown after migration from 12SP5 to 15SP6
Last modified: 2024-03-11 01:06:18 UTC
Created attachment 873141 [details] serial0.txt ## Observation This test is migration from SLES 12SP5 to 15SP6 with gnome, after migration and reboot, but there is sporadic issue that the display manager can't be shown. And this failure only happened on ppc64le, the reproduce rate is about 1/5. After the issue happened, it seems the system is hang and can't switch tty to support journal log. Besides, I have tried to switch the test to multipath target, and haven't found the issue any more. https://openqa.suse.de/tests/13639655#step/first_boot/5 (Show the login screen, check job name prefix of multipath https://openqa.suse.de/tests/overview?version=15-SP6&build=lemon-suse%2Fos-autoinst-distri-opensuse%23ppc64le-fisrt-boot-slow&distri=sle 0/10 reproduced the issue) openQA test in scenario sle-15-SP6-Regression-on-Migration-from-SLE12-SPx-ppc64le-offline_sles12sp5_pscc_sdk-lp-asmm-contm-lgm-tcm-wsm-pcm_all_full@ppc64le-4g fails in [first_boot](https://openqa.suse.de/tests/13636406/modules/first_boot/steps/4) ## Test suite description The base test suite is used for job templates defined in YAML documents. It has no settings of its own. ## Reproducible Fails since (at least) Build [53.1](https://openqa.suse.de/tests/13458620) ## Expected result Last good: (unknown) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Regression-on-Migration-from-SLE12-SPx&machine=ppc64le-4g&test=offline_sles12sp5_pscc_sdk-lp-asmm-contm-lgm-tcm-wsm-pcm_all_full&version=15-SP6)
It seems there is no related log about display manager or GNOME session, and I checked the systemctl output, there is even no graphical target...
Hi, we reproduced this issue on latest build 62.1, https://openqa.suse.de/tests/13714553#step/first_boot/4 Any log or info needed? We will try to provide to help to make thing clear. Thanks.
(In reply to Ming Li from comment #4) > Hi, we reproduced this issue on latest build 62.1, > https://openqa.suse.de/tests/13714553#step/first_boot/4 > > Any log or info needed? We will try to provide to help to make thing clear. > Thanks. From this case I don't find the journal log. Could you collect the journal log by "journalctl -b", and the package information by "rpm -qa".
(In reply to xiaoguang wang from comment #5) > (In reply to Ming Li from comment #4) > > Hi, we reproduced this issue on latest build 62.1, > > https://openqa.suse.de/tests/13714553#step/first_boot/4 > > > > Any log or info needed? We will try to provide to help to make thing clear. > > Thanks. > > From this case I don't find the journal log. > Could you collect the journal log by "journalctl -b", and the package > information by "rpm -qa". I tried to switch tty when the failure happened but failed, so can't provide the journal log, it seems the system is hang. Just see a failure in worker: [2024-03-07T03:31:18.125684Z] [debug] [pid:77254] QEMU: KVM: Failed to create TCE64 table for liobn 0x80000000 [2024-03-07T03:32:47.256961Z] [debug] [pid:77254] QEMU: KVM: Failed to create TCE64 table for liobn 0x80000001 Not sure whether related with the issue. Besides, I tried to disable kdump and clone the test 10 times, but still reproduced this issue at least (4/10) https://openqa.suse.de/tests/overview?distri=sle&build=lemon-suse%2Fos-autoinst-distri-opensuse%23master&version=15-SP6 (job prefixed as no-kdump)
(In reply to Ming Li from comment #6) > (In reply to xiaoguang wang from comment #5) > > (In reply to Ming Li from comment #4) > > > Hi, we reproduced this issue on latest build 62.1, > > > https://openqa.suse.de/tests/13714553#step/first_boot/4 > > > > > > Any log or info needed? We will try to provide to help to make thing clear. > > > Thanks. > > > > From this case I don't find the journal log. > > Could you collect the journal log by "journalctl -b", and the package > > information by "rpm -qa". > > I tried to switch tty when the failure happened but failed, so can't provide > the journal log, it seems the system is hang. If the graphical environment issue is suspected here, the necessary logs make much better sense to diagnose it further. How about the network status, does ssh work to collect logs?
(In reply to Yifan Jiang from comment #7) > (In reply to Ming Li from comment #6) > > (In reply to xiaoguang wang from comment #5) > > > (In reply to Ming Li from comment #4) > > > > Hi, we reproduced this issue on latest build 62.1, > > > > https://openqa.suse.de/tests/13714553#step/first_boot/4 > > > > > > > > Any log or info needed? We will try to provide to help to make thing clear. > > > > Thanks. > > > > > > From this case I don't find the journal log. > > > Could you collect the journal log by "journalctl -b", and the package > > > information by "rpm -qa". > > > > I tried to switch tty when the failure happened but failed, so can't provide > > the journal log, it seems the system is hang. > > If the graphical environment issue is suspected here, the necessary logs > make much better sense to diagnose it further. How about the network status, > does ssh work to collect logs? I'm trying to check whether ssh work or not when issue happened, but it is a random issue and un-luky for me that I haven't met this issue when I set developer mode. Besides, currently migration test have a ssh issue that after migration the root login is disabled, so need enable root login before first_boot. And I have another thinking about this, I found sometimes on one worker 'mania' it can switch tty when issue happend, so one idea is to create a branch to run more times to get the needed log; another is try to update the openQA worker to check the results.
(In reply to Ming Li from comment #8) > (In reply to Yifan Jiang from comment #7) > > (In reply to Ming Li from comment #6) > > > (In reply to xiaoguang wang from comment #5) > > > > (In reply to Ming Li from comment #4) > > > > > Hi, we reproduced this issue on latest build 62.1, > > > > > https://openqa.suse.de/tests/13714553#step/first_boot/4 > > > > > > > > > > Any log or info needed? We will try to provide to help to make thing clear. > > > > > Thanks. > > > > > > > > From this case I don't find the journal log. > > > > Could you collect the journal log by "journalctl -b", and the package > > > > information by "rpm -qa". > > > > > > I tried to switch tty when the failure happened but failed, so can't provide > > > the journal log, it seems the system is hang. > > > > If the graphical environment issue is suspected here, the necessary logs > > make much better sense to diagnose it further. How about the network status, > > does ssh work to collect logs? > I'm trying to check whether ssh work or not when issue happened, but it is a > random issue and un-luky for me that I haven't met this issue when I set > developer mode. > Besides, currently migration test have a ssh issue that after migration the > root login is disabled, so need enable root login before first_boot. > And I have another thinking about this, I found sometimes on one worker > 'mania' it can switch tty when issue happend, so one idea is to create a > branch to run more times to get the needed log; another is try to update the > openQA worker to check the results. I think I just reproduced the issue with developer mode, but I can't ssh login since root login disabled, I tried to change /etc/ssh/sshd_config in VNC but failed to switch tty.