Bug 1212752

Summary: On VM installed via autoyast, boot messages get blocked with Autoyast init script service when booting from a read-only snapshot
Product: [openSUSE] PUBLIC SUSE Linux Enterprise Server 15 SP5 Reporter: Richard Fan <richard.fan>
Component: OtherAssignee: Cliff Zhao <qzhao>
Status: RESOLVED NORESPONSE QA Contact: Cliff Zhao <qzhao>
Severity: Normal    
Priority: P5 - None CC: richard.fan, santiago.zarate
Version: unspecified   
Target Milestone: ---   
Hardware: x86-64   
OS: SLES 15   
URL: https://openqa.suse.de/tests/11366577/modules/boot_into_snapshot/steps/18
Whiteboard:
Found By: openQA Services Priority:
Business Priority: Blocker: Yes
Marketing QA Status: --- IT Deployment: ---
Attachments: screen shot when hitting the boot issue
full serial logs for system during the tests
y2logs for the VM

Description Richard Fan 2023-06-27 01:54:07 UTC
##Description
OS: sle15sp5
setup: Virtual machine on kvm host

The sample command to boot up the VM is:

#/usr/bin/qemu-system-x86_64 \
-device VGA,edid=on,xres=1024,yres=768 \
-m 1024 \
-cpu qemu64 \
-netdev user,id=qanet0 \
-device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 \
-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-pci,rng=rng0 \
-boot order=c \
-device qemu-xhci \
-device usb-tablet \
-smp 1 \
-enable-kvm \
-vnc :105 \
-device virtio-serial \
-chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on \
-device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console \
-chardev pipe,id=virtio_console_user,path=virtio_console_user,logfile=virtio_console_user.log,logappend=on \
-device virtconsole,chardev=virtio_console_user,name=org.openqa.console.virtio_console_user \
-chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on \
-qmp chardev:qmp_socket \
-device virtio-scsi-pci,id=scsi0 \
-blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/15/raid/hd0-overlay0,cache.no-flush=on \
-blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on,discard=unmap \
-device virtio-blk,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0
The system is installed via autoyast, when I try to reboot the system and start bootloader from read-only snapshot created after system installation. [I think it is created by default]. I can see the system boot messages get blocked at "Starting Autoyast2 Init Scripts". please see attached picture.

However, it seem only get blocked at QEMU VNC console, I can see that the system is booted up and show login prompt at serial console side.

I can see some error messages in serial logs which may help something: [Please see attached file for detail information]

Jun 15 23:29:48.939689 localhost systemd[1]: Starting Autoyast2 Init Scripts...
Jun 15 23:29:48.942084 localhost systemd[1]: Starting Postfix Mail Transport Agent...
Jun 15 23:29:48.942133 localhost systemd[1]: Condition check resulted in Rollback Helper for Registration being skipped.
Jun 15 23:29:48.942879 localhost systemd[1]: Starting System Logging Service...
Jun 15 23:29:48.943545 localhost systemd[1]: Starting OpenSSH Daemon...
Jun 15 23:29:49.034197 localhost sshd-gen-keys-start[1371]: Checking for missing server keys in /etc/ssh
Jun 15 23:29:49.072407 localhost rsyslogd[1378]: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.2106.0]
Jun 15 23:29:49.072474 localhost systemd[1]: Started System Logging Service.
Jun 15 23:29:49.072412 localhost rsyslogd[1378]: [origin software="rsyslogd" swVersion="8.2106.0" x-pid="1378" x-info="https://www.rsyslog.com"] start
Jun 15 23:29:49.079244 localhost sshd[1379]: Server listening on 0.0.0.0 port 22.
Jun 15 23:29:49.080531 localhost sshd[1379]: Server listening on :: port 22.
Jun 15 23:29:49.080636 localhost systemd[1]: Started OpenSSH Daemon.
Jun 15 23:29:49.098101 localhost systemd[1]: Received SIGRTMIN+21 from PID 297 (plymouthd).
Jun 15 23:29:49.199403 localhost autoyast-initscripts.sh[1385]: Failed to disable unit: File /etc/systemd/system/default.target.wants/autoyast-initscripts.service: Read-only file system
Jun 15 23:29:49.203637 localhost systemd[1]: Finished Autoyast2 Init Scripts.


Expected result:

OS can boot up successfully from read-only snapshot without boot messages blocked.

----------------------------------
Open QA findings:

## Observation

openQA test in scenario sle-15-SP5-Online-x86_64-boot_to_snapshot@64bit fails in
[boot_into_snapshot](https://openqa.suse.de/tests/11366577/modules/boot_into_snapshot/steps/18)

## Test suite description
Maintainer: QE Core
Test booting to automatically created snapshots.


## Reproducible

Fails since (at least) Build [rfan_autoyast](https://openqa.suse.de/tests/11366532)


## Expected result

Last good: [102.1](https://openqa.suse.de/tests/11164249) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=64bit&test=boot_to_snapshot&version=15-SP5)
Comment 1 Richard Fan 2023-06-27 01:54:52 UTC
Created attachment 867829 [details]
screen shot when hitting the boot issue
Comment 2 Richard Fan 2023-06-27 01:55:34 UTC
Created attachment 867830 [details]
full serial logs for system during the tests
Comment 3 Richard Fan 2023-06-27 01:57:15 UTC
Created attachment 867831 [details]
y2logs for the VM
Comment 4 Santiago Zarate 2023-06-28 07:50:45 UTC
I wonder if it's related somehow to: https://bugzilla.suse.com/show_bug.cgi?id=1207284
Comment 5 Stefan Hundhammer 2023-06-28 08:37:09 UTC
This does not look like an installer bug.

If the system doesn't boot at all because the bootloader isn't installed correctly, or if the bootloader doesn't find the root filesytem, yes; but here it gets stuck in the middle of the boot process.
Comment 6 Stefan Hundhammer 2023-06-28 08:46:07 UTC
Does it work when you try to boot into one of the other snapshots?

If yes, that means that this particular snapshot may be corrupted (no idea why).
Comment 7 Stefan Hundhammer 2023-06-28 08:51:54 UTC
I understand you observed that it's only the system messages on the console that have a problem, and that otherwise the system booted alright. Correct?


Jun 15 23:29:49.098101 localhost systemd[1]: 
  Received SIGRTMIN+21 from PID 297 (plymouthd).

https://askubuntu.com/questions/956160/received-sigrtmin21-from-pid-251-plymouthd

Some of the answers there seem to suggest that that signal is supposed to happen during boot (not sure about that), and that it switches off console messages; which might be what you see.

This might also indicate that there is a problem with plymouth.
Comment 8 Stefan Hundhammer 2023-06-28 09:03:22 UTC
From the full serial logs (see attachment):

> ...
> ...
> Jun 15 23:29:43.400917 localhost polkitd[766]: Started polkitd version 121
> Jun 15 23:29:43.403242 localhost polkitd[766]: Loading rules from directory /etc/polkit-1/rules.d
> Jun 15 23:29:43.403273 localhost polkitd[766]: Loading rules from directory /usr/share/polkit-1/rules.d
> Jun 15 23:29:43.407511 localhost polkitd[766]: Finished loading, compiling and executing 2 rules
> Jun 15 23:29:43.407819 localhost dbus-daemon[588]: [system] Successfully activated service 'org.freedesktop.PolicyKit1'
> Jun 15 23:29:43.408034 localhost systemd[1]: Started Authorization Manager.
> Jun 15 23:29:43.409038 localhost polkitd[766]: Acquired the name org.freedesktop.PolicyKit1 on the system bus
> Jun 15 23:29:43.778401 localhost wickedd-dhcp4[687]: eth0: Request to acquire DHCPv4 lease with UUID 27d78b64-b0d3-0b00-b602-000004000000
> Jun 15 23:29:43.779595 localhost wickedd-dhcp6[689]: eth0: Request to acquire DHCPv6 lease with UUID 27d78b64-b0d3-0b00-b602-000005000000 in mode auto
> Jun 15 23:29:43.785931 localhost kernel: NET: Registered PF_PACKET protocol family
> Jun 15 23:29:43.838638 localhost nscd[595]: 595 ignored inotify event for `/etc/resolv.conf` (file exists)
> Jun 15 23:29:43.838649 localhost nscd[595]: 595 ignored inotify event for `/etc/resolv.conf` (file exists)
> Jun 15 23:29:44.421973 localhost wickedd-dhcp4[687]: eth0: Committed DHCPv4 lease with address 10.0.2.15 (lease time 86400, renew in 43200 sec, rebind in 75600 sec)
> Jun 15 23:29:45.538267 localhost wickedd-dhcp6[689]: eth0: DHCPv6 is disabled by IPv6 router RA
> Jun 15 23:29:46.594080 localhost wickedd-dhcp6[689]: eth0: DHCPv6 is disabled by IPv6 router RA
> Jun 15 23:29:47.522053 localhost wickedd-dhcp6[689]: eth0: DHCPv6 is disabled by IPv6 router RA
> Jun 15 23:29:48.935432 localhost wicked[708]: lo              up
> Jun 15 23:29:48.935432 localhost wicked[708]: eth0            up
> Jun 15 23:29:48.936719 localhost systemd[1]: Finished wicked managed network interfaces.
> Jun 15 23:29:48.937025 localhost systemd[1]: Reached target Network.
> Jun 15 23:29:48.937179 localhost systemd[1]: Started Backup of /etc/sysconfig.
> Jun 15 23:29:48.937260 localhost systemd[1]: Reached target Network is Online.
> Jun 15 23:29:48.937360 localhost systemd[1]: Started Schedule daily SUSEConnect --keepalive calls.
> Jun 15 23:29:48.937438 localhost systemd[1]: Reached target Timer Units.
> Jun 15 23:29:48.939689 localhost systemd[1]: Starting Autoyast2 Init Scripts...
> Jun 15 23:29:48.942084 localhost systemd[1]: Starting Postfix Mail Transport Agent...
> Jun 15 23:29:48.942133 localhost systemd[1]: Condition check resulted in Rollback Helper for Registration being skipped.
> Jun 15 23:29:48.942879 localhost systemd[1]: Starting System Logging Service...
> Jun 15 23:29:48.943545 localhost systemd[1]: Starting OpenSSH Daemon...
> .
> .
> ======================== Messages on console end here =========================
> .
> .
> Jun 15 23:29:49.034197 localhost sshd-gen-keys-start[1371]: Checking for missing server keys in /etc/ssh
> Jun 15 23:29:49.072407 localhost rsyslogd[1378]: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.2106.0]
> Jun 15 23:29:49.072474 localhost systemd[1]: Started System Logging Service.
> Jun 15 23:29:49.072412 localhost rsyslogd[1378]: [origin software="rsyslogd" swVersion="8.2106.0" x-pid="1378" x-info="https://www.rsyslog.com"] start
> Jun 15 23:29:49.079244 localhost sshd[1379]: Server listening on 0.0.0.0 port 22.
> Jun 15 23:29:49.080531 localhost sshd[1379]: Server listening on :: port 22.
> Jun 15 23:29:49.080636 localhost systemd[1]: Started OpenSSH Daemon.
> Jun 15 23:29:49.098101 localhost systemd[1]: Received SIGRTMIN+21 from PID 297 (plymouthd).
> Jun 15 23:29:49.199403 localhost autoyast-initscripts.sh[1385]: Failed to disable unit: File /etc/systemd/system/default.target.wants/autoyast-initscripts.service: Read-only file system
> Jun 15 23:29:49.203637 localhost systemd[1]: Finished Autoyast2 Init Scripts.
> Jun 15 23:29:49.204584 localhost systemd[1]: Starting Permit User Sessions...
> Jun 15 23:29:49.213355 localhost systemd[1]: Finished Permit User Sessions.
> Jun 15 23:29:49.214219 localhost systemd[1]: Starting Hold until boot process finishes up...
> Jun 15 23:29:49.214888 localhost systemd[1]: Starting Terminate Plymouth Boot Screen...
> Jun 15 23:29:49.226576 localhost echo[1369]: Starting mail service (Postfix)
> Jun 15 23:29:49.230451 localhost systemd[1]: Received SIGRTMIN+21 from PID 297 (n/a).
> Jun 15 23:29:49.231051 localhost systemd[1]: Finished Hold until boot process finishes up.
> Jun 15 23:29:49.231365 localhost systemd[1]: Finished Terminate Plymouth Boot Screen.
> Jun 15 23:29:49.232639 localhost systemd[1]: Started Getty on tty1.
> Jun 15 23:29:49.233789 localhost systemd[1]: Started Serial Getty on hvc0.
> Jun 15 23:29:49.235250 localhost systemd[1]: Started Serial Getty on ttyS0.
> Jun 15 23:29:49.236036 localhost systemd[1]: Reached target Login Prompts.
> Jun 15 23:29:49.236279 localhost systemd[1]: Condition check resulted in /etc/init.d/after.local Compatibility being skipped.
> Jun 15 23:29:49.347710 localhost update_postmaps[1397]: chmod: changing permissions of '/etc/postfix/virtual.lmdb': Read-only file system
> Jun 15 23:29:49.348614 localhost update_postmaps[1398]: chmod: changing permissions of '/etc/postfix/transport.lmdb': Read-only file system
> Jun 15 23:29:49.349433 localhost update_postmaps[1399]: chmod: changing permissions of '/etc/postfix/access.lmdb': Read-only file system
> Jun 15 23:29:49.350247 localhost update_postmaps[1400]: chmod: changing permissions of '/etc/postfix/canonical.lmdb': Read-only file system
> Jun 15 23:29:49.351069 localhost update_postmaps[1401]: chmod: changing permissions of '/etc/postfix/sender_canonical.lmdb': Read-only file system
> Jun 15 23:29:49.351848 localhost update_postmaps[1402]: chmod: changing permissions of '/etc/postfix/relocated.lmdb': Read-only file system
> Jun 15 23:29:49.352634 localhost update_postmaps[1403]: chmod: changing permissions of '/etc/postfix/sasl_passwd.lmdb': Read-only file system
> Jun 15 23:29:49.353406 localhost update_postmaps[1404]: chmod: changing permissions of '/etc/postfix/relay_ccerts.lmdb': Read-only file system
> Jun 15 23:29:49.354316 localhost update_postmaps[1405]: chmod: changing permissions of '/etc/postfix/helo_access.lmdb': Read-only file system
> Jun 15 23:29:49.355137 localhost update_postmaps[1406]: chmod: changing permissions of '/etc/postfix/relay.lmdb': Read-only file system
> Jun 15 23:29:49.355915 localhost update_postmaps[1407]: chmod: changing permissions of '/etc/postfix/relay_recipients.lmdb': Read-only file system
> Jun 15 23:29:49.562103 localhost postfix/postfix-script[1475]: starting the Postfix mail system
> Jun 15 23:29:49.567129 localhost postfix/master[1477]: daemon started -- version 3.7.2, configuration /etc/postfix
> Jun 15 23:29:49.658378 localhost systemd[1]: Started Postfix Mail Transport Agent.
> Jun 15 23:29:49.659300 localhost systemd[1]: Started Command Scheduler.
> Jun 15 23:29:49.659429 localhost systemd[1]: Reached target Multi-User System.
> Jun 15 23:29:49.660089 localhost systemd[1]: Starting Record Runlevel Change in UTMP...
> Jun 15 23:29:49.664332 localhost cron[1497]: (CRON) STARTUP (1.5.7)
> Jun 15 23:29:49.664534 localhost cron[1497]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 53% if used.)
> Jun 15 23:29:49.664618 localhost cron[1497]: (CRON) INFO (running with inotify support)
> Jun 15 23:29:49.667787 localhost systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
> Jun 15 23:29:49.667919 localhost systemd[1]: Finished Record Runlevel Change in UTMP.
> Jun 15 23:29:49.668010 localhost systemd[1]: Startup finished in 945ms (kernel) + 1.793s (initrd) + 9.117s (userspace) = 11.856s.
> Jun 15 23:32:06.409823 localhost systemd[1]: Created slice User Slice of UID 0.
> Jun 15 23:32:06.411259 localhost systemd[1]: Starting User Runtime Directory /run/user/0...
> Jun 15 23:32:06.412553 localhost systemd-logind[617]: New session 1 of user root.
> Jun 15 23:32:06.419157 localhost systemd[1]: Finished User Runtime Directory /run/user/0.
> Jun 15 23:32:06.420285 localhost systemd[1]: Starting User Manager for UID 0...
> Jun 15 23:32:06.424269 localhost (systemd)[1523]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
> Jun 15 23:32:06.489235 localhost systemd[1523]: Queued start job for default target Main User Target.
> Jun 15 23:32:06.490192 localhost systemd[1523]: Created slice User Application Slice.
> Jun 15 23:32:06.490294 localhost systemd[1523]: Started Daily Cleanup of User's Temporary Directories.
> Jun 15 23:32:06.490384 localhost systemd[1523]: Reached target Paths.
> Jun 15 23:32:06.490462 localhost systemd[1523]: Reached target Timers.
> Jun 15 23:32:06.490918 localhost systemd[1523]: Starting D-Bus User Message Bus Socket...
> Jun 15 23:32:06.491389 localhost systemd[1523]: Starting Create User's Volatile Files and Directories...
> Jun 15 23:32:06.498246 localhost systemd[1523]: Finished Create User's Volatile Files and Directories.
> Jun 15 23:32:06.499155 localhost systemd[1523]: Listening on D-Bus User Message Bus Socket.
> Jun 15 23:32:06.499187 localhost systemd[1523]: Reached target Sockets.
> Jun 15 23:32:06.499196 localhost systemd[1523]: Reached target Basic System.
> Jun 15 23:32:06.499217 localhost systemd[1523]: Reached target Main User Target.
> Jun 15 23:32:06.499227 localhost systemd[1523]: Startup finished in 72ms.
> Jun 15 23:32:06.499283 localhost systemd[1]: Started User Manager for UID 0.
> Jun 15 23:32:06.499831 localhost systemd[1]: Started Session 1 of User root.
> Jun 15 23:32:06.503116 localhost login[1392]: pam_unix(login:session): session opened for user root by LOGIN(uid=0)
> Jun 15 23:32:06.503716 localhost login[1392]: ROOT LOGIN ON hvc0
Comment 9 Stefan Hundhammer 2023-06-28 09:07:27 UTC
So, system messages do continue, they are just not visible on the console anymore. Plymouth maintainers, please take over from here.
Comment 10 Richard Fan 2023-06-28 12:47:02 UTC
(In reply to Stefan Hundhammer from comment #7)
> I understand you observed that it's only the system messages on the console
> that have a problem, and that otherwise the system booted alright. Correct?
> 
> 

Correct!

> Jun 15 23:29:49.098101 localhost systemd[1]: 
>   Received SIGRTMIN+21 from PID 297 (plymouthd).
> 
> https://askubuntu.com/questions/956160/received-sigrtmin21-from-pid-251-
> plymouthd
> 
> Some of the answers there seem to suggest that that signal is supposed to
> happen during boot (not sure about that), and that it switches off console
> messages; which might be what you see.
> 
> This might also indicate that there is a problem with plymouth.
Comment 11 Richard Fan 2023-06-28 12:47:45 UTC
(In reply to Stefan Hundhammer from comment #6)
> Does it work when you try to boot into one of the other snapshots?
> 

Other snapshot can't work as well.

> If yes, that means that this particular snapshot may be corrupted (no idea
> why).
Comment 12 Cliff Zhao 2023-06-29 15:10:04 UTC
(In reply to Richard Fan from comment #0)
> ## Reproducible
> Fails since (at least) Build [rfan_autoyast](https://openqa.suse.de/tests/11366532)
> ## Expected result
> Last good: [102.1](https://openqa.suse.de/tests/11164249) (or more recent)
There's no change in Plymouth source during SLES15-SP5, even in these 2 editions.
At least, please indicate which components have been updated here. I think we should start with these components for further analysis. I guess the test plan may have this info or ask the release manager.

> So, system messages do continue, they are just not visible on the console
> anymore. Plymouth maintainers, please take over from here.
This result could not indicate that Plymouth introduced the problem here.
Comment 13 Cliff Zhao 2024-07-17 02:05:07 UTC
(In reply to Cliff Zhao from comment #12)
> (In reply to Richard Fan from comment #0)
> > ## Reproducible
> > Fails since (at least) Build [rfan_autoyast](https://openqa.suse.de/tests/11366532)
> > ## Expected result
> > Last good: [102.1](https://openqa.suse.de/tests/11164249) (or more recent)
> There's no change in Plymouth source during SLES15-SP5, even in these 2
> editions.
> At least, please indicate which components have been updated here. I think
> we should start with these components for further analysis. I guess the test
> plan may have this info or ask the release manager.
> 
> > So, system messages do continue, they are just not visible on the console
> > anymore. Plymouth maintainers, please take over from here.
> This result could not indicate that Plymouth introduced the problem here.