Bug 1214452 - [Build 13.4] Can't select user serial terminal after textmode installation
Summary: [Build 13.4] Can't select user serial terminal after textmode installation
Status: RESOLVED FIXED
Alias: None
Product: PUBLIC SUSE Linux Enterprise Server 15 SP6
Classification: openSUSE
Component: systemd (show other bugs)
Version: unspecified
Hardware: Other Other
: P3 - Medium : Normal
Target Milestone: ---
Assignee: systemd maintainers
QA Contact:
URL: https://openqa.suse.de/tests/11895814...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-22 06:14 UTC by Huajian Luo
Modified: 2023-09-20 06:30 UTC (History)
5 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments
yast2log (2.85 MB, application/x-bzip)
2023-08-22 06:16 UTC, Huajian Luo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Huajian Luo 2023-08-22 06:14:24 UTC
## Desciption:
  
When run yast installation of the sle 15sp6 builds 13.4 with textmode. we found that after installation, we can't select the user serial terminal bernhard.

## Error Log:

'''
# Test died: Failed to wait for login prompt at sle/lib/serial_terminal.pm line 171.
	serial_terminal::login("bernhard", "\$ ") called at sle/lib/susedistribution.pm line 827
	susedistribution::activate_console(Distribution::Sle::15_current=HASH(0x5603e16005e0), "user-virtio-terminal") called at /usr/lib/os-autoinst/testapi.pm line 1695
	testapi::select_console("user-virtio-terminal") called at sle/lib/serial_terminal.pm line 392
	serial_terminal::select_serial_terminal(0) called at sle/lib/serial_terminal.pm line 404
	serial_terminal::select_user_serial_terminal() called at sle/tests/console/consoletest_finish.pm line 66

'''

## Observation

openQA test in scenario sle-15-SP6-Online-x86_64-textmode@uefi fails in
[consoletest_finish](https://openqa.suse.de/tests/11895814/modules/consoletest_finish/steps/7)

## Test suite description
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Maintainer: QE Core

Installation in textmode and selecting the textmode "desktop" during installation.


## Reproducible

Fails since (at least) Build [13.4](https://openqa.suse.de/tests/11861553)


## Expected result

Last good: [10.3](https://openqa.suse.de/tests/11781620) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=uefi&test=textmode&version=15-SP6)
Comment 1 Huajian Luo 2023-08-22 06:16:12 UTC
Created attachment 868930 [details]
yast2log
Comment 2 Lukas Ocilka 2023-08-22 07:29:34 UTC
This does not seem to have anything with the Installation itself. The user is created and can login. Please, debug yourself and try to find out more first.
Comment 3 Huajian Luo 2023-08-22 09:17:35 UTC
Thank you for the information. I've tried to select the user serial terminal after the installation finished and just rebooted, which resulted in the same error.

https://openqa.suse.de/tests/11896397#step/consoletest_seerror
Comment 4 Radoslav Tzvetkov 2023-08-24 12:29:16 UTC
Huanjian, any other ideas on how you can debug it?
Comment 5 Richard Fan 2023-09-14 06:23:35 UTC
I did some investigation via openQA dev mode, and I can find that the service "serial-getty@hv1.service" is not enabled/started on the test virtual machine.
At the same time, no issue with service "serial-getty@hv0.service"

It might be the right service to connect to the user serial terminal device which
defined in backend qemu command:

-device virtio-serial \
-chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on \
-device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console \
-chardev pipe,id=virtio_console_user,path=virtio_console_user,logfile=virtio_console_user.log,logappend=on \
-device virtconsole,chardev=virtio_console_user,name=org.openqa.console.virtio_console_user \
-chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on \
-qmp chardev:qmp_socket \

Hope it can help something.
Comment 6 Richard Fan 2023-09-14 06:27:38 UTC
I forgot something, after enabling/starting the service, I did the below steps:

#echo "abc" > /dev/hvc1

From openqa worker, I can see it :)
#cat virtio_console_user.log 
abc
Comment 7 Takashi Iwai 2023-09-14 09:09:39 UTC
Why it's assigned to kernel?  Is it a regression in kernel?
Comment 8 Richard Fan 2023-09-14 13:03:29 UTC
Well, the openQA test passed after I tried to add some test code to enable/start the `serial-getty@hvc1.service` before starting this job.

#systemctl enable serial-getty@hvc1.service
#systemctl start serial-getty@hvc1.service

passed job: https://openqa.suse.de/tests/12125695#

So the issue might have something to do with this service not started even the serial device is attached to the virtual machine.
Comment 9 Thomas Blume 2023-09-18 09:33:49 UTC
(In reply to Richard Fan from comment #8)
> Well, the openQA test passed after I tried to add some test code to
> enable/start the `serial-getty@hvc1.service` before starting this job.
> 
> #systemctl enable serial-getty@hvc1.service
> #systemctl start serial-getty@hvc1.service
> 
> passed job: https://openqa.suse.de/tests/12125695#
> 
> So the issue might have something to do with this service not started even
> the serial device is attached to the virtual machine.

I could reproduce the issue on our team internal openQA instance and investigated after the test failure.
There is no indication of a problem with the getty on hvc0 (I don't have a hvc1).
However, what I can see in both, your openQA test and also mine is a "permission denied" when openQA wants to write its status to the serial console, e.g.:

https://openqa.suse.de/tests/12119977#step/consoletest_finish/5

This is just before the wait_serial error.
Is it possible that the openQA error log is a bit misleading and the true error is the failing status write?

I'd suspect that even more considering comment#6 where writing to hvc1 is apparently successfull.
Comment 10 Thomas Blume 2023-09-18 09:39:10 UTC
btw. an earlier test, starting from:

https://openqa.suse.de/tests/12119977#step/keymap_or_locale/5

also shows the "permission denied" and seems to go wrong afterwards, though it doesn't seem to be fatal for the overall test run.
Comment 11 Thomas Blume 2023-09-18 09:54:36 UTC
(In reply to Thomas Blume from comment #10)
> btw. an earlier test, starting from:
> 
> https://openqa.suse.de/tests/12119977#step/keymap_or_locale/5
> 
> also shows the "permission denied" and seems to go wrong afterwards, though
> it doesn't seem to be fatal for the overall test run.

The writing still succeeds a bit earlier in the test run:

https://openqa.suse.de/tests/12119977#step/consoletest_setup/62
https://openqa.suse.de/tests/12119977#step/consoletest_setup/63
Comment 12 Thomas Blume 2023-09-18 09:57:43 UTC
(In reply to Thomas Blume from comment #11)
> (In reply to Thomas Blume from comment #10)
> > btw. an earlier test, starting from:
> > 
> > https://openqa.suse.de/tests/12119977#step/keymap_or_locale/5
> > 
> > also shows the "permission denied" and seems to go wrong afterwards, though
> > it doesn't seem to be fatal for the overall test run.
> 
> The writing still succeeds a bit earlier in the test run:
> 
> https://openqa.suse.de/tests/12119977#step/consoletest_setup/62
> https://openqa.suse.de/tests/12119977#step/consoletest_setup/63

But that's done by root instead of the user bernhard.
So, maybe writing that status as user bernhard is wrong?
Comment 13 Richard Fan 2023-09-18 11:01:14 UTC
(In reply to Thomas Blume from comment #12)
> (In reply to Thomas Blume from comment #11)
> > (In reply to Thomas Blume from comment #10)
> > > btw. an earlier test, starting from:
> > > 
> > > https://openqa.suse.de/tests/12119977#step/keymap_or_locale/5
> > > 
> > > also shows the "permission denied" and seems to go wrong afterwards, though
> > > it doesn't seem to be fatal for the overall test run.
> > 
> > The writing still succeeds a bit earlier in the test run:
> > 
> > https://openqa.suse.de/tests/12119977#step/consoletest_setup/62
> > https://openqa.suse.de/tests/12119977#step/consoletest_setup/63
> 
> But that's done by root instead of the user bernhard.
> So, maybe writing that status as user bernhard is wrong?

Yes, you are right, non-root user doesn't have permission to access /dev/ttyS0 by default. however, I don't think this error caused the job failing.

as per passed job in another job group, we can see the hvc1 is logged:

https://openqa.suse.de/tests/12119474/logfile?filename=serial_terminal_user.txt
Comment 14 Thomas Blume 2023-09-18 12:43:58 UTC
(In reply to Richard Fan from comment #13)
> Yes, you are right, non-root user doesn't have permission to access
> /dev/ttyS0 by default. however, I don't think this error caused the job
> failing.
> 
> as per passed job in another job group, we can see the hvc1 is logged:
> 
> https://openqa.suse.de/tests/12119474/logfile?filename=serial_terminal_user.
> txt

Hm, it seems in that test the getty on hvc1 gets activated by the test code.
In:

https://openqa.suse.de/tests/12119474/logfile?filename=autoinst-log.txt

I can see:

-->
[2023-09-14T10:24:31.797716+02:00] [debug] [pid:62166] <<< testapi::script_run(cmd="systemctl enable serial-getty\@hvc1; systemctl start serial-getty\@hvc1", timeout=30, quiet=undef, output="", die_on_timeout=1)
--<


I can't find any message about hvc in:

https://openqa.suse.de/tests/12119977/logfile?filename=autoinst-log.txt
Comment 15 Richard Fan 2023-09-18 13:05:53 UTC
(In reply to Thomas Blume from comment #14)
> (In reply to Richard Fan from comment #13)
> > Yes, you are right, non-root user doesn't have permission to access
> > /dev/ttyS0 by default. however, I don't think this error caused the job
> > failing.
> > 
> > as per passed job in another job group, we can see the hvc1 is logged:
> > 
> > https://openqa.suse.de/tests/12119474/logfile?filename=serial_terminal_user.
> > txt
> 
> Hm, it seems in that test the getty on hvc1 gets activated by the test code.
> In:
> 
> https://openqa.suse.de/tests/12119474/logfile?filename=autoinst-log.txt
> 
> I can see:
> 
> -->
> [2023-09-14T10:24:31.797716+02:00] [debug] [pid:62166] <<<
> testapi::script_run(cmd="systemctl enable serial-getty\@hvc1; systemctl
> start serial-getty\@hvc1", timeout=30, quiet=undef, output="",
> die_on_timeout=1)
> --<
> 
> 
> I can't find any message about hvc in:
> 
> https://openqa.suse.de/tests/12119977/logfile?filename=autoinst-log.txt

Thanks much for your quick findings! yes we have the test code to enable the hvc1,[which is introduced by test module 'system_prepare']

However based on the test result https://openqa.suse.de/tests/11770044#next_previous

Please see the latest passed job https://openqa.suse.de/tests/11781620. it has passed without enabling the hvc1. 
That means the the same test passed before but fails now. It must be caused by something.

Anyway, I think we have a workaround to enable the hvc1, so it is not a critical bug now.

Thanks again.
Comment 16 Thomas Blume 2023-09-18 16:16:15 UTC
(In reply to Richard Fan from comment #15)

> Please see the latest passed job https://openqa.suse.de/tests/11781620. it
> has passed without enabling the hvc1. 
> That means the the same test passed before but fails now. It must be caused
> by something.

I've directly cloned the above test and ran into the same issue as initially reported.
IMHO, that means the change that made the test fail, cannot be in the installation medium (sle-15-SP6-Online-x86_64-Build10.3-textmode@uefi) as the same succeeded in 11781620.

So, the only conclusion I can draw from this, is that something in the openQA testcode has changed.

Unfortunately, I cannot find the logs for test 11781620.
It would be interesting to see whether there is also a login attempt to hvc1 there.

> Anyway, I think we have a workaround to enable the hvc1, so it is not a
> critical bug now.
> 
> Thanks again.

Welcome, for further investigation, it would be good, if you could provide testlogs where the login to hvc1 succeeded but serial-getty@hvc1.service was not enabled by the test code itself.
Comment 17 Huajian Luo 2023-09-20 04:17:39 UTC
After we add system_prepare after first_boot in the test it passed. 
https://openqa.suse.de/tests/12201183#step/consoletest_finish/5 . so maybe a test-related issue.
Comment 18 Thomas Blume 2023-09-20 06:30:40 UTC
(In reply to Huajian Luo from comment #17)
> After we add system_prepare after first_boot in the test it passed. 
> https://openqa.suse.de/tests/12201183#step/consoletest_finish/5 . so maybe a
> test-related issue.

Ok, thanks for the feedback, closing the bug for now.
If you find any indication that the issue might be systemd related in the end, please reopen.