Bugzilla – Bug 1211219
[Build 481.1] openQA test fails in system_prepare - fails to select tty6
Last modified: 2023-10-26 12:06:18 UTC
## Observation openQA test in scenario opensuse-15.5-DVD-x86_64-upgrade_Leap_15.0_gnome@64bit fails in [system_prepare](https://openqa.opensuse.org/tests/3276029/modules/system_prepare/steps/2) I don't really know what change breaking the tty6 console selecting, compare to the last snapshot, the package change has just a small amount of package udpate as the last SLE15 SP5's check-in round, including yast2-bootloader/yast2-storage-ng/libstorage-ng/yast2-network/libyui/kernel, DimStar point me he has been seen this issue in a yast2-bootloader staging on Factory, it was vanished later, and the changes yast related(mostly), thus I file this bug as yast related. To be honest, I'm not so sure it's a yast bug, an openqa issue, or kernel related. This issue is happened on *Leap < 15.1* migration only[1], I don't see this issue appeared in > 15.1 migration like 15.2, 15.3 and 15.4. At the failing step, it should switch to tty6, but it's failed now. [1] https://openqa.opensuse.org/tests/overview?version=15.5&distri=opensuse&build=481.1&groupid=50 ## Test suite description Upgrade scenario from Leap 15.0 with gnome installed. ## Reproducible Fails since (at least) Build [481.1](https://openqa.opensuse.org/tests/3274745) ## Expected result Last good: [480.2](https://openqa.opensuse.org/tests/3273344) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=upgrade_Leap_15.0_gnome&version=15.5)
Why is this a YaST bug? For all we know, it might be a number of things, including (not limited to): - Plymouth - A key combination like Alt-F6 / Ctrl-Alt-F6 not arriving - The desktop / window system (Wayland? X11?) consuming that key combination - systemd not starting a getty process on /dev/tty6 - No /dev/tty6 device being created I see YaST successfully completing the installation, I see the machine successfully rebooting, the desktop successfully starting. YaST isn't involved anymore by a long shot.
Please try manually if you can switch to any of the other text consoles (/dev/tty2, /dev/tty3, ...) successfully. On my Leap 15.4, "xev" tells me about Ctrl-Alt-F6: > KeyRelease event, serial 39, synthetic NO, window 0x4a00001, > root 0x1ce, subw 0x0, time 20240717, (129,141), root:(641,763), > state 0xc, keycode 72 (keysym 0x1008fe06, XF86Switch_VT_6), same_screen YES, > XLookupString gives 0 bytes: > XFilterEvent returns: False (Just before the screen goes black and I am on console 6; when I switch back to my X11 session and scroll back in the terminal window where I started it) Does it work that far on that machine? Do you see that event? Notice "XF86Switch_VT_6".
As for getty processes, it looks like systemd starts them only on demand. After I did those experiments with Ctrl-Alt-F6 and Ctrl-Alt-F2, "ps" tells me: > root 12519 0.0 0.0 3144 900 tty2 Ss+ 14:33 0:00 /sbin/agetty -o -p -- \u --noclear tty2 linux > root 12635 0.0 0.0 3144 864 tty6 Ss+ 14:34 0:00 /sbin/agetty -o -p -- \u --noclear tty6 linux Notice that there is no (a)getty process for any other virtual console: Not for /dev/tty1, 3, 4, 5.
The upstream getty changelog has a surprising number of entries referring to "getty": https://github.com/systemd/systemd/blob/main/NEWS * The systemd-getty-generator now honors a new kernel command line argument systemd.getty_auto= and a new environment variable $SYSTEMD_GETTY_AUTO that allows turning it off at boot. This is for example useful to turn off gettys inside of containers or similar environments. ... ... * During package installation (with `ninja install`), we would create symlinks for getty@tty1.service, systemd-networkd.service, systemd-networkd.socket, systemd-resolved.service, remote-cryptsetup.target, remote-fs.target, systemd-networkd-wait-online.service, and systemd-timesyncd.service in /etc, as if `systemctl enable` was called for those units, to make the system usable immediately after installation. Now this is not done anymore, and instead calling `systemctl preset-all` is recommended after the first installation of systemd. ... ... * logind will now always reserve one VT for a text getty (VT6 by default). Previously if more than 6 X sessions where started they took up all the VTs with auto-spawned gettys, so that no text gettys were available anymore. ... ...
(In reply to Stefan Hundhammer from comment #4) > The upstream getty changelog has a surprising number of entries referring to ^^^^^ systemd
See man systemd-logind and man logind-conf
See comment #2.
I also don't see in the openQA test *how* it attempts the switching to tty6: https://openqa.opensuse.org/tests/3276029/modules/system_prepare/steps/1/src Does it send a key combination? Does it use the "chvt" command? Some ioctrl sent directly from the Perl code?
(In reply to Stefan Hundhammer from comment #1) > Why is this a YaST bug? > > For all we know, it might be a number of things, including (not limited to): > > - Plymouth > > - A key combination like Alt-F6 / Ctrl-Alt-F6 not arriving > > - The desktop / window system (Wayland? X11?) consuming that key combination > > - systemd not starting a getty process on /dev/tty6 > > - No /dev/tty6 device being created > > I see YaST successfully completing the installation, I see the machine > successfully rebooting, the desktop successfully starting. YaST isn't > involved anymore by a long shot. I totally agree with you Huha, that sounds a bit weird if a yast bug, but the last check-in into SLE15 SP5 mostly are yast changes, therefore I filed a yast bugreport in case there is any potential relating yast change may cause this issue, like handles parameters/configs on upgrades. Ctrl-Alt-F6 works on my 15.4 and 15.5 for sure, as https://bugzilla.suse.com/show_bug.cgi?id=1211219#c0 explained, this issue we only seen on the system upgraded from < 15.1(15.0, 42.x) to 15.5. It's weird not happened on 15.1 and the later version. I find a agetty error in serial output. ` [ 101.927960] agetty[2655]: checkname failed: Operation not permitted [ 111.929623] systemd[1]: getty@tty6.service: Deactivated successfully. [ 111.933762] systemd[1]: getty@tty6.service: Scheduled restart job, restart counter is at 1. [ 111.937384] systemd[1]: Stopped Getty on tty6. [ 111.939049] systemd[1]: Started Getty on tty6. ` I'll have a deeper look.
If you look at the systemd changelog that I linked above, there might have been lots of subtle changes to systemd or any other program from the systemd universe; logind comes to mind. Maybe one of the post-install etc. scripts of one of those packages doesn't handle some upgrade scenario 100% correctly (that looks like the most realistic source of problems IMHO). Maybe there was a note somewhere in the change log / systemd release notes that in such and such scenario, we should run some one-time script for a proper migration. Maybe we are using some parameter or environment variable that has been obsoleted in the meantime. There are a gazillion possibilities. But anyway, those systemd messages you quoted don't sound good. Maybe involve one of our systemd experts?
The systemd maintainers should know a lot more about this. Reassigning. % osc maintainer -e systemd Defined in package: Base:System/systemd bugowner of systemd : systemd-maintainers@suse.de maintainer of systemd : systemd-maintainers@suse.de, thomas.blume@suse.com, fbui@suse.com Defined in project: Base:System bugowner of systemd : - maintainer of systemd : dmueller@suse.com, meissner@suse.com, ro@suse.de, aj@suse.com, seife@novell.slipkontur.de, trenn@suse.com, werner@suse.com, daniel@molkentin.de, - Defined in project: Base bugowner of systemd : - maintainer of systemd : adrian.schroeter@suse.com, jblunck@novell.com, rguenther@suse.com
Created attachment 866892 [details] diff between build480.2 and build481.1 Build480.2 doesn't have this issue, this file contains a diff between Build480.2 and Build481.1, since this issue happened on migration via dvd image, perhaps a change in the build481.1 image suspicious to be the root cause.
We can disregard all those yast2-trans*.rpm packages: That's only translations, no code. I don't see how any of libyui*, libstorage*, storage-ng* could possibly cause this problem. virtualbox-kmp-default? I don't know, but it's very unlikely IMHO. What surprises me, though, is that there is no newer kernel on that image.
(In reply to Stefan Hundhammer from comment #13) > We can disregard all those yast2-trans*.rpm packages: That's only > translations, no code. > > I don't see how any of libyui*, libstorage*, storage-ng* could possibly > cause this problem. > > virtualbox-kmp-default? I don't know, but it's very unlikely IMHO. > > What surprises me, though, is that there is no newer kernel on that image. I agree above packages seem to be disregard, but yast2-bootloader was on the list as well, I did a bisect build - rebuild product with a reverted yast2-bootloader(ver. 4.5.8) as the 4.5.9 change smells suspicious[1], so we've Build483.3 https://openqa.opensuse.org/tests/overview?distri=opensuse&version=15.5&build=483.3&groupid=50 , and the console selecting issue has disappeared, and another strange issue[2] on aarch64 in the previous broken build(Build481.1) is disappeared also. Do you have any idea here? [1] https://build.opensuse.org/package/rdiff/SUSE:SLE-15-SP5:GA/yast2-bootloader?linkrev=base&rev=8 [2] https://openqa.opensuse.org/tests/3275161 - boot directory being uppercase, it used to be lowercase prior to Build481.1
That would be this PR: https://github.com/yast/yast-bootloader/pull/684/files The only code change is this: https://github.com/yast/yast-bootloader/pull/684/commits/fdd212bc1ed1bc44ed9eca46057fc81e0c1a3a48 Josef, can this cause the described effect here: That there is no more /dev/tty6? I can't quite believe that.
That PR was the fix for bug #1210811. In that case, the bootloader proposal was reset once too many which caused the user's changes to be discarded. But that means that the bootloader proposal was done one more time; maybe that is what is missing in that 15-SP1 -> 15-SP5 upgrade scenario? That was buggy all the time AFAICS, but maybe it was hiding another subtle bug in the bootloader configuration; one that was then silently overwritten with a valid one.
So I downloaded the qcow2 image used by the test [1] and gave it a test. It runs a non up to date Leap 15.0. Surprisingly switching to different TTYs already doesn't work. Updating the system didn't help and starting the system with the multi-user.target didn't either. After some investigation it appears that removing the kernel option "video=1024x768-16" fixes the problem. I don't know whether this option is specific to systems installed for openqa but it has a negative impact on the frame buffer. I tested this option on SLE15-SP4 and same (broken) behavior. Only TW seems works fine with (or without) it. Does that ring a bell to someone ? otherwise I'll pass the buck to the kernel team. [1] https://openqa.opensuse.org/tests/3274745/asset/hdd/opensuse-15.0-x86_64-GM-gnome@64bit.qcow2
Reassigning to the kernel team to investigate why "video=xxx" option prevents the system from navigating through the different ttys properly.
Adding graphics guys to Cc.
(In reply to Franck Bui from comment #18) > So I downloaded the qcow2 image used by the test [1] and gave it a test. It > runs a non up to date Leap 15.0. Surprisingly switching to different TTYs > already doesn't work. Updating the system didn't help and starting the > system with the multi-user.target didn't either. > > After some investigation it appears that removing the kernel option > "video=1024x768-16" fixes the problem. I don't know whether this option is > specific to systems installed for openqa but it has a negative impact on the > frame buffer. I tested this option on SLE15-SP4 and same (broken) behavior. > Only TW seems works fine with (or without) it. > > Does that ring a bell to someone ? otherwise I'll pass the buck to the > kernel team. > > [1] > https://openqa.opensuse.org/tests/3274745/asset/hdd/opensuse-15.0-x86_64-GM- > gnome@64bit.qcow2 Can you try with video=1024x768-32 instead to see if that makes a difference?
I gave it a try, upgrade the system from Leap 15.0 with gnome installed[1] to Leap 15.5(Build481.1, which https://build.opensuse.org/package/rdiff/SUSE:SLE-15-SP5:GA/yast2-bootloader?linkrev=base&rev=8 has included), if booting with "1024x768-32" I can successful to switch to tty6, or any other tty. Try with origin param "1024x768-16" then tty6 fails to selected. [1] https://openqa.opensuse.org/tests/3274745/asset/hdd/opensuse-15.0-x86_64-GM-gnome@64bit.qcow2
AFAIK, the 16bpp comes from the fact that cirrus driver didn't work with 32bpp. Now we're using another driver (bochs or virtio), so *-16 is rather meaningless. OTOH, we should fix a bug if it's really due to some driver bug, of course...
*** Bug 1216200 has been marked as a duplicate of this bug. ***
My interpretation is that video=...-16 means that the fb console attempts a 16bpp mode but this only works with cirrus for graphics. Otherwise I don't see how this would've passed openQA testing back then. I see three options: a) Edit the upgrade tests to no longer pass the video=...-16 param b) Run the affected upgrade tests with QEMUVGA=cirrus (might need new needles!) c) Fix the kernel to support video=...-16 with other graphics