Bug 1211219 - [Build 481.1] openQA test fails in system_prepare - fails to select tty6
Summary: [Build 481.1] openQA test fails in system_prepare - fails to select tty6
Status: NEW
: 1216200 (view as bug list)
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Leap 15.5
Hardware: Other Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL: https://openqa.opensuse.org/tests/327...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-09 09:00 UTC by Max Lin
Modified: 2023-10-26 12:06 UTC (History)
9 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments
diff between build480.2 and build481.1 (26.73 KB, text/plain)
2023-05-10 07:17 UTC, Max Lin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Max Lin 2023-05-09 09:00:39 UTC
## Observation

openQA test in scenario opensuse-15.5-DVD-x86_64-upgrade_Leap_15.0_gnome@64bit fails in
[system_prepare](https://openqa.opensuse.org/tests/3276029/modules/system_prepare/steps/2)

I don't really know what change breaking the tty6 console selecting, compare to the last snapshot, the package change has just a small amount of package udpate as the last SLE15 SP5's check-in round, including yast2-bootloader/yast2-storage-ng/libstorage-ng/yast2-network/libyui/kernel, DimStar point me he has been seen this issue in a yast2-bootloader staging on Factory, it was vanished later, and the changes yast related(mostly), thus I file this bug as yast related. To be honest, I'm not so sure it's a yast bug, an openqa issue, or kernel related.

This issue is happened on *Leap < 15.1* migration only[1], I don't see this issue appeared in > 15.1 migration like 15.2, 15.3 and 15.4. At the failing step, it should switch to tty6, but it's failed now.

[1] https://openqa.opensuse.org/tests/overview?version=15.5&distri=opensuse&build=481.1&groupid=50

## Test suite description
Upgrade scenario from Leap 15.0 with gnome installed.


## Reproducible

Fails since (at least) Build [481.1](https://openqa.opensuse.org/tests/3274745)


## Expected result

Last good: [480.2](https://openqa.opensuse.org/tests/3273344) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=upgrade_Leap_15.0_gnome&version=15.5)
Comment 1 Stefan Hundhammer 2023-05-09 12:28:21 UTC
Why is this a YaST bug?

For all we know, it might be a number of things, including (not limited to):

- Plymouth

- A key combination like Alt-F6 / Ctrl-Alt-F6 not arriving

- The desktop / window system (Wayland? X11?) consuming that key combination

- systemd not starting a getty process on /dev/tty6

- No /dev/tty6 device being created

I see YaST successfully completing the installation, I see the machine successfully rebooting, the desktop successfully starting. YaST isn't involved anymore by a long shot.
Comment 2 Stefan Hundhammer 2023-05-09 12:39:28 UTC
Please try manually if you can switch to any of the other text consoles (/dev/tty2, /dev/tty3, ...) successfully.

On my Leap 15.4, "xev" tells me about Ctrl-Alt-F6:

> KeyRelease event, serial 39, synthetic NO, window 0x4a00001,
>     root 0x1ce, subw 0x0, time 20240717, (129,141), root:(641,763),
>     state 0xc, keycode 72 (keysym 0x1008fe06, XF86Switch_VT_6), same_screen YES,
>     XLookupString gives 0 bytes: 
>     XFilterEvent returns: False


(Just before the screen goes black and I am on console 6; when I switch back to my X11 session and scroll back in the terminal window where I started it)

Does it work that far on that machine? Do you see that event?
Notice "XF86Switch_VT_6".
Comment 3 Stefan Hundhammer 2023-05-09 12:42:37 UTC
As for getty processes, it looks like systemd starts them only on demand. After I did those experiments with Ctrl-Alt-F6 and Ctrl-Alt-F2, "ps" tells me:

> root     12519  0.0  0.0   3144   900 tty2     Ss+  14:33   0:00 /sbin/agetty -o -p -- \u --noclear tty2 linux
> root     12635  0.0  0.0   3144   864 tty6     Ss+  14:34   0:00 /sbin/agetty -o -p -- \u --noclear tty6 linux


Notice that there is no (a)getty process for any other virtual console: Not for /dev/tty1, 3, 4, 5.
Comment 4 Stefan Hundhammer 2023-05-09 12:55:54 UTC
The upstream getty changelog has a surprising number of entries referring to "getty":

https://github.com/systemd/systemd/blob/main/NEWS

* The systemd-getty-generator now honors a new kernel command line
  argument systemd.getty_auto= and a new environment variable
  $SYSTEMD_GETTY_AUTO that allows turning it off at boot. This is for
  example useful to turn off gettys inside of containers or similar
  environments.

...
...

* During package installation (with `ninja install`), we would create
  symlinks for getty@tty1.service, systemd-networkd.service,
  systemd-networkd.socket, systemd-resolved.service,
  remote-cryptsetup.target, remote-fs.target,
  systemd-networkd-wait-online.service, and systemd-timesyncd.service
  in /etc, as if `systemctl enable` was called for those units, to make
  the system usable immediately after installation. Now this is not
  done anymore, and instead calling `systemctl preset-all` is
  recommended after the first installation of systemd.

...
...

* logind will now always reserve one VT for a text getty (VT6
  by default). Previously if more than 6 X sessions where
  started they took up all the VTs with auto-spawned gettys,
  so that no text gettys were available anymore.

...
...
Comment 5 Stefan Hundhammer 2023-05-09 12:56:25 UTC
(In reply to Stefan Hundhammer from comment #4)
> The upstream getty changelog has a surprising number of entries referring to
               ^^^^^
              systemd
Comment 6 Stefan Hundhammer 2023-05-09 13:02:57 UTC
See  man systemd-logind   and   man logind-conf
Comment 7 Stefan Hundhammer 2023-05-09 13:07:17 UTC
See comment #2.
Comment 8 Stefan Hundhammer 2023-05-09 13:21:30 UTC
I also don't see in the openQA test *how* it attempts the switching to tty6:

https://openqa.opensuse.org/tests/3276029/modules/system_prepare/steps/1/src

Does it send a key combination?
Does it use the "chvt" command?
Some ioctrl sent directly from the Perl code?
Comment 9 Max Lin 2023-05-09 13:21:55 UTC
(In reply to Stefan Hundhammer from comment #1)
> Why is this a YaST bug?
> 
> For all we know, it might be a number of things, including (not limited to):
> 
> - Plymouth
> 
> - A key combination like Alt-F6 / Ctrl-Alt-F6 not arriving
> 
> - The desktop / window system (Wayland? X11?) consuming that key combination
> 
> - systemd not starting a getty process on /dev/tty6
> 
> - No /dev/tty6 device being created
> 
> I see YaST successfully completing the installation, I see the machine
> successfully rebooting, the desktop successfully starting. YaST isn't
> involved anymore by a long shot.

I totally agree with you Huha, that sounds a bit weird if a yast bug, but the last check-in into SLE15 SP5 mostly are yast changes, therefore I filed a yast bugreport in case there is any potential relating yast change may cause this issue, like handles parameters/configs on upgrades.

Ctrl-Alt-F6 works on my 15.4 and 15.5 for sure, as https://bugzilla.suse.com/show_bug.cgi?id=1211219#c0 explained, this issue we only seen on the system upgraded from < 15.1(15.0, 42.x) to 15.5. It's weird not happened on 15.1 and the later version.

I find a agetty error in serial output.

`
[  101.927960] agetty[2655]: checkname failed: Operation not permitted
[  111.929623] systemd[1]: getty@tty6.service: Deactivated successfully.
[  111.933762] systemd[1]: getty@tty6.service: Scheduled restart job, restart counter is at 1.
[  111.937384] systemd[1]: Stopped Getty on tty6.
[  111.939049] systemd[1]: Started Getty on tty6.
`

I'll have a deeper look.
Comment 10 Stefan Hundhammer 2023-05-09 13:43:52 UTC
If you look at the systemd changelog that I linked above, there might have been lots of subtle changes to systemd or any other program from the systemd universe; logind comes to mind.

Maybe one of the post-install etc. scripts of one of those packages doesn't handle some upgrade scenario 100% correctly (that looks like the most realistic source of problems IMHO).

Maybe there was a note somewhere in the change log / systemd release notes that in such and such scenario, we should run some one-time script for a proper migration.

Maybe we are using some parameter or environment variable that has been obsoleted in the meantime.

There are a gazillion possibilities.

But anyway, those systemd messages you quoted don't sound good. Maybe involve one of our systemd experts?
Comment 11 Stefan Hundhammer 2023-05-09 14:42:12 UTC
The systemd maintainers should know a lot more about this. Reassigning.

% osc maintainer -e systemd

Defined in package: Base:System/systemd 
  bugowner of systemd : 
   systemd-maintainers@suse.de

  maintainer of systemd : 
   systemd-maintainers@suse.de, thomas.blume@suse.com, fbui@suse.com

Defined in project:  Base:System
  bugowner of systemd : 
   -

  maintainer of systemd : 
   dmueller@suse.com, meissner@suse.com, ro@suse.de, aj@suse.com, seife@novell.slipkontur.de, trenn@suse.com, werner@suse.com, daniel@molkentin.de, -

Defined in project:  Base
  bugowner of systemd : 
   -

  maintainer of systemd : 
   adrian.schroeter@suse.com, jblunck@novell.com, rguenther@suse.com
Comment 12 Max Lin 2023-05-10 07:17:39 UTC
Created attachment 866892 [details]
diff between build480.2 and build481.1

Build480.2 doesn't have this issue, this file contains a diff between Build480.2 and Build481.1, since this issue happened on migration via dvd image, perhaps a change in the build481.1 image suspicious to be the root cause.
Comment 13 Stefan Hundhammer 2023-05-10 08:50:39 UTC
We can disregard all those yast2-trans*.rpm packages: That's only translations, no code.

I don't see how any of libyui*, libstorage*, storage-ng* could possibly cause this problem.

virtualbox-kmp-default? I don't know, but it's very unlikely IMHO.

What surprises me, though, is that there is no newer kernel on that image.
Comment 14 Max Lin 2023-05-11 06:08:32 UTC
(In reply to Stefan Hundhammer from comment #13)
> We can disregard all those yast2-trans*.rpm packages: That's only
> translations, no code.
> 
> I don't see how any of libyui*, libstorage*, storage-ng* could possibly
> cause this problem.
> 
> virtualbox-kmp-default? I don't know, but it's very unlikely IMHO.
> 
> What surprises me, though, is that there is no newer kernel on that image.

I agree above packages seem to be disregard, but yast2-bootloader was on the list as well, I did a bisect build - rebuild product with a reverted yast2-bootloader(ver. 4.5.8) as the 4.5.9 change smells suspicious[1], so we've Build483.3 https://openqa.opensuse.org/tests/overview?distri=opensuse&version=15.5&build=483.3&groupid=50 , and the console selecting issue has disappeared, and another strange issue[2] on aarch64 in the previous broken build(Build481.1) is disappeared also.

Do you have any idea here?

[1] https://build.opensuse.org/package/rdiff/SUSE:SLE-15-SP5:GA/yast2-bootloader?linkrev=base&rev=8
[2] https://openqa.opensuse.org/tests/3275161 - boot directory being uppercase, it used to be lowercase prior to Build481.1
Comment 15 Stefan Hundhammer 2023-05-11 07:45:49 UTC
That would be this PR:

https://github.com/yast/yast-bootloader/pull/684/files

The only code change is this:

https://github.com/yast/yast-bootloader/pull/684/commits/fdd212bc1ed1bc44ed9eca46057fc81e0c1a3a48


Josef, can this cause the described effect here: That there is no more /dev/tty6? I can't quite believe that.
Comment 16 Stefan Hundhammer 2023-05-11 08:22:55 UTC
That PR was the fix for bug #1210811.

In that case, the bootloader proposal was reset once too many which caused the user's changes to be discarded.

But that means that the bootloader proposal was done one more time; maybe that is what is missing in that 15-SP1 -> 15-SP5 upgrade scenario? That was buggy all the time AFAICS, but maybe it was hiding another subtle bug in the bootloader configuration; one that was then silently overwritten with a valid one.
Comment 18 Franck Bui 2023-05-12 12:27:41 UTC
So I downloaded the qcow2 image used by the test [1] and gave it a test. It runs a non up to date Leap 15.0. Surprisingly switching to different TTYs already doesn't work. Updating the system didn't help and starting the system with the multi-user.target didn't either.

After some investigation it appears that removing the kernel option "video=1024x768-16" fixes the problem. I don't know whether this option is specific to systems installed for openqa but it has a negative impact on the frame buffer. I tested this option on SLE15-SP4 and same (broken) behavior. Only TW seems works fine with (or without) it. 

Does that ring a bell to someone ? otherwise I'll pass the buck to the kernel team.

[1] https://openqa.opensuse.org/tests/3274745/asset/hdd/opensuse-15.0-x86_64-GM-gnome@64bit.qcow2
Comment 19 Franck Bui 2023-05-16 13:55:19 UTC
Reassigning to the kernel team to investigate why "video=xxx" option prevents the system from navigating through the different ttys properly.
Comment 20 Takashi Iwai 2023-05-22 15:30:33 UTC
Adding graphics guys to Cc.
Comment 21 Patrik Jakobsson 2023-05-23 06:49:10 UTC
(In reply to Franck Bui from comment #18)
> So I downloaded the qcow2 image used by the test [1] and gave it a test. It
> runs a non up to date Leap 15.0. Surprisingly switching to different TTYs
> already doesn't work. Updating the system didn't help and starting the
> system with the multi-user.target didn't either.
> 
> After some investigation it appears that removing the kernel option
> "video=1024x768-16" fixes the problem. I don't know whether this option is
> specific to systems installed for openqa but it has a negative impact on the
> frame buffer. I tested this option on SLE15-SP4 and same (broken) behavior.
> Only TW seems works fine with (or without) it. 
> 
> Does that ring a bell to someone ? otherwise I'll pass the buck to the
> kernel team.
> 
> [1]
> https://openqa.opensuse.org/tests/3274745/asset/hdd/opensuse-15.0-x86_64-GM-
> gnome@64bit.qcow2

Can you try with video=1024x768-32 instead to see if that makes a difference?
Comment 22 Max Lin 2023-05-23 12:52:49 UTC
I gave it a try, upgrade the system from Leap 15.0 with gnome installed[1] to Leap 15.5(Build481.1, which https://build.opensuse.org/package/rdiff/SUSE:SLE-15-SP5:GA/yast2-bootloader?linkrev=base&rev=8 has included), if booting with "1024x768-32" I can successful to switch to tty6, or any other tty. Try with origin param "1024x768-16" then tty6 fails to selected.

[1] https://openqa.opensuse.org/tests/3274745/asset/hdd/opensuse-15.0-x86_64-GM-gnome@64bit.qcow2
Comment 23 Takashi Iwai 2023-05-23 13:01:48 UTC
AFAIK, the 16bpp comes from the fact that cirrus driver didn't work with 32bpp.  Now we're using another driver (bochs or virtio), so *-16 is rather meaningless.

OTOH, we should fix a bug if it's really due to some driver bug, of course...
Comment 25 Max Lin 2023-10-26 11:27:11 UTC
*** Bug 1216200 has been marked as a duplicate of this bug. ***
Comment 26 Fabian Vogt 2023-10-26 12:06:18 UTC
My interpretation is that video=...-16 means that the fb console attempts a 16bpp mode but this only works with cirrus for graphics. Otherwise I don't see how this would've passed openQA testing back then.

I see three options:
a) Edit the upgrade tests to no longer pass the video=...-16 param
b) Run the affected upgrade tests with QEMUVGA=cirrus (might need new needles!)
c) Fix the kernel to support video=...-16 with other graphics