Bugzilla – Bug 1219173
[Build 47.2] Boot to a previous snapshot failed after migration
Last modified: 2024-02-27 02:24:15 UTC
Created attachment 872179 [details] boot.msg Steps: - Boot an image of SLES12SP5 and register the system via scc, fully patch the system - Reboot and perform offline migration to SLES15SP6 Build47.2 via Full media or proxySCC - After migration to SLES15SP6 success, reboot the system again and boot to a snapshot (the SLES12SP5 (before upgrade)) Observed: The system can not boot to SLES12SP5 snapshot, see the openqa test result as below: openQA test in scenario sle-15-SP6-Migration-from-SLE12-SPx-x86_64-offline_sles12sp5_media_we-lp_def_full@64bit fails in [snapper_rollback](https://openqa.suse.de/tests/13328325/modules/snapper_rollback/steps/3) and https://openqa.suse.de/tests/13328326#step/grub_test_snapshot/20 I attach some logs for this bug report, if more logs needed here, please feel free to contact me, thanks.
Created attachment 872180 [details] boot.log
Created attachment 872181 [details] snapper.log
This the last good job with build45.1 https://openqa.suse.de/tests/13086727#step/snapper_rollback/1 And I have tried to double timeout time, enlarge the cpu number and memory for the openqa worker, but it doesn't resolve the problem.
(In reply to Chenzi Cao from comment #1) > Created attachment 872180 [details] > boot.log It could be that the X Display Manager failed in starting itself so no graphical session / desktop in the end ... [[0;1;3mFAILED[0m] Failed to start X Display Manager. See 'systemctl status display-manager.service' for details. However the serial terminal was doing fine in the log. https://openqa.suse.de/tests/13328325/logfile?filename=serial0.txt (In reply to Chenzi Cao from comment #0) [snip] > I attach some logs for this bug report, if more logs needed here, please > feel free to contact me, thanks. Is it possible to collect the output of `systemctl status display-manager.service' from the serial console ? Thanks.
Perhaps the failure in starting X Display Manager doesn't matter given it would offer a text console instead. The boot log also has: [[0;32m OK [0m] Reached target Multi-User System. [[0;32m OK [0m] Reached target Graphical Interface. Not sure why no text console spawned in the end. Hi Franck and Thomas: Did you have better idea why there's no console for the user ? Thanks.
(In reply to Michael Chang from comment #5) > Perhaps the failure in starting X Display Manager doesn't matter given it > would offer a text console instead. > > The boot log also has: > > [[0;32m OK [0m] Reached target Multi-User System. > [[0;32m OK [0m] Reached target Graphical Interface. > > Not sure why no text console spawned in the end. > > Hi Franck and Thomas: > > Did you have better idea why there's no console for the user ? > Thanks. I think that is a display-manager issue. In https://openqa.suse.de/tests/13328326/logfile?filename=serial0.txt and https://openqa.suse.de/tests/13328325/logfile?filename=serial0.txt I can see errors starting the display-manager, e.g.: --> [ 24.864316] display-manager[1301]: Starting service gdm..unused [ 24.940783] systemd[1]: display-manager.service: Control process exited, code=exited status=6 [ 24.943471] systemd[1]: Failed to start X Display Manager. [ 24.944431] systemd[1]: display-manager.service: Unit entered failed state. [ 24.945395] systemd[1]: display-manager.service: Failed with result 'exit-code'. --< --> [ 24.563308] display-manager[1336]: Starting service gdm..unused [ 24.642150] systemd[1]: display-manager.service: Control process exited, code=exited status=6 [ 24.644548] systemd[1]: Failed to start X Display Manager. [ 24.645460] systemd[1]: display-manager.service: Unit entered failed state. [ 24.646428] systemd[1]: display-manager.service: Failed with result 'exit-code'. --<
(In reply to Michael Chang from comment #5) > Perhaps the failure in starting X Display Manager doesn't matter given it > would offer a text console instead. Ah sorry, I didn't answer your question. When graphical target is set as default, there is no fallback to multi-user.target if the graphical display manager fails to start. You would see a console login when switching to another console, but the console where gdm is supposed to start will not show it.
Created attachment 872346 [details] full log Please find the saved whole log of /var/log, I'm collect the output of 'systemctl status display-manager.service' ASAP (met some problems when switching to tty to collect info).
(In reply to Michael Chang from comment #4) > Is it possible to collect the output of `systemctl status > display-manager.service' from the serial console ? > Thanks. I can not collect it from serial console, the system hang when booting to snapshot. And I cc display-manager maintainers here to take a look at this issue, thanks.
(In reply to Chenzi Cao from comment #11) > (In reply to Michael Chang from comment #4) > > > Is it possible to collect the output of `systemctl status > > display-manager.service' from the serial console ? > > Thanks. > > I can not collect it from serial console, the system hang when booting to > snapshot. > And I cc display-manager maintainers here to take a look at this issue, > thanks. Hi Chenzi, Thanks for your help. As long as this is not bootloader related, reassign to display-manager maintainers as they are investigating it.
Here's a similar migration scenario but pass when booting to snapshots after migration: https://openqa.suse.de/tests/13549303#step/snapper_rollback/1
In general, the issue is still obscure to me. But I got some ideas what was going on through the full log: https://bugzilla.suse.com/show_bug.cgi?id=1219173#c9 We can evidently see display-manager rejects to attempt to launch the process: > 2024-01-25T08:45:15.542481+01:00 susetest display-manager[1352]: Starting service gdm..unused > 2024-01-25T08:45:15.544819+01:00 susetest systemd[1]: display-manager.service: Control process exited, code=exited status=6 The exited status 6 comes from /usr/lib/X11/display-manager when /usr/bin/X is not found: > if [ ! -x /usr/bin/X -a "$DISPLAYMANAGER_REMOTE_ACCESS" = "no" ]; > then > exit 6 > fi I suspect either some links or core files are missing in the middle of migration (e.g. snapshot before the update), or something wrong with the base sle12sp5 image before doing the migration. Not sure it is a real bug or some thing else. Regarding Chenzi's comment#13, it is used a different base sle12sp5 to test, so this may explain why it passed the test.
/usr/bin/X should be a symlink to /usr/bin/Xorg. Check both.
(In reply to Stefan Dirsch from comment #15) > /usr/bin/X should be a symlink to /usr/bin/Xorg. Check both. It's true for SLE 15, but the snapshot is taken on SLE 12. At that time, I think /usr/bin/X is linked to /var/lib/X11/X, I saw this on line 611 at: https://build.suse.de/package/view_file/SUSE:SLE-12-SP5:Update/xorg-x11-server/xorg-x11-server.spec
Ok. But there /var/lib/X11/X is then linked to /usr/bin/Xorg
I downloaded the base system of SLE-12-SP5 (HDD_1 in https://openqa.suse.de/tests/13395441#settings) in the problematic testing, and found the btrfs layout takes /var/ as a separate subvolume, which should only be the case for SLE-15 products (https://www.suse.com/releasenotes/x86_64/SUSE-SLES/15/index.html#fate-325797). So this will exclude everything in /var/ attend the pre update snapshot, including the /var/lib/X11/X required by /usr/bin/X and display-manager. > ... > ID 260 gen 191 top level 257 path @/var > ID 261 gen 153 top level 257 path @/usr/local > ID 262 gen 189 top level 257 path @/tmp > ID 263 gen 95 top level 257 path @/srv > ID 264 gen 188 top level 257 path @/root > ID 265 gen 24 top level 257 path @/opt > ID 266 gen 20 top level 257 path @/boot/grub2/x86_64-efi > ID 267 gen 34 top level 257 path @/boot/grub2/i386-pc > ID 270 gen 26 top level 260 path @/var/lib/machines > ... Chenzi, can you check how the base system was created? To unblock the test, it's better to figure out how the @/var was created in the testing environment. btw. you can refer the regular SLE-12-SP5 btrfs layout by installing it and check "btrfs subvolume list /": > ID 260 gen 221 top level 257 path @/home > ID 261 gen 50 top level 257 path @/boot/grub2/i386-pc > ID 262 gen 16 top level 257 path @/boot/grub2/x86_64-efi > ID 263 gen 42 top level 257 path @/opt > ID 264 gen 46 top level 257 path @/srv > ID 265 gen 221 top level 257 path @/tmp > ID 266 gen 221 top level 257 path @/usr/local > ID 267 gen 220 top level 257 path @/var/cache > ID 268 gen 196 top level 257 path @/var/crash > ID 269 gen 45 top level 257 path @/var/lib/libvirt/images > ID 270 gen 41 top level 257 path @/var/lib/machines > ID 271 gen 42 top level 257 path @/var/lib/mailman > ID 272 gen 27 top level 257 path @/var/lib/mariadb > ID 273 gen 28 top level 257 path @/var/lib/mysql > ID 274 gen 220 top level 257 path @/var/lib/named > ID 275 gen 30 top level 257 path @/var/lib/pgsql > ID 276 gen 221 top level 257 path @/var/log > ID 277 gen 42 top level 257 path @/var/opt > ID 278 gen 220 top level 257 path @/var/spool > ID 279 gen 221 top level 257 path @/var/tmp
(In reply to Yifan Jiang from comment #18) > I downloaded the base system of SLE-12-SP5 (HDD_1 in > https://openqa.suse.de/tests/13395441#settings) in the problematic testing, > and found the btrfs layout takes /var/ as a separate subvolume, which should > only be the case for SLE-15 products > (https://www.suse.com/releasenotes/x86_64/SUSE-SLES/15/index.html#fate- > 325797). So this will exclude everything in /var/ attend the pre update > snapshot, including the /var/lib/X11/X required by /usr/bin/X and > display-manager. > > > ... > > ID 260 gen 191 top level 257 path @/var > > ID 261 gen 153 top level 257 path @/usr/local > > ID 262 gen 189 top level 257 path @/tmp > > ID 263 gen 95 top level 257 path @/srv > > ID 264 gen 188 top level 257 path @/root > > ID 265 gen 24 top level 257 path @/opt > > ID 266 gen 20 top level 257 path @/boot/grub2/x86_64-efi > > ID 267 gen 34 top level 257 path @/boot/grub2/i386-pc > > ID 270 gen 26 top level 260 path @/var/lib/machines > > ... > > Chenzi, can you check how the base system was created? To unblock the test, > it's better to figure out how the @/var was created in the testing > environment. btw. you can refer the regular SLE-12-SP5 btrfs layout by > installing it and check "btrfs subvolume list /": Hi Yifan, the base system image is created by autoyast installation: https://openqa.suse.de/tests/13577200# Its autoyast profile: https://openqa.suse.de/tests/13577200/settings/yam/autoyast/support_images/sles12sp5_install_default_patterns_x86_64.xml And I'll check the autoyast profile ASAP. > > > ID 260 gen 221 top level 257 path @/home > > ID 261 gen 50 top level 257 path @/boot/grub2/i386-pc > > ID 262 gen 16 top level 257 path @/boot/grub2/x86_64-efi > > ID 263 gen 42 top level 257 path @/opt > > ID 264 gen 46 top level 257 path @/srv > > ID 265 gen 221 top level 257 path @/tmp > > ID 266 gen 221 top level 257 path @/usr/local > > ID 267 gen 220 top level 257 path @/var/cache > > ID 268 gen 196 top level 257 path @/var/crash > > ID 269 gen 45 top level 257 path @/var/lib/libvirt/images > > ID 270 gen 41 top level 257 path @/var/lib/machines > > ID 271 gen 42 top level 257 path @/var/lib/mailman > > ID 272 gen 27 top level 257 path @/var/lib/mariadb > > ID 273 gen 28 top level 257 path @/var/lib/mysql > > ID 274 gen 220 top level 257 path @/var/lib/named > > ID 275 gen 30 top level 257 path @/var/lib/pgsql > > ID 276 gen 221 top level 257 path @/var/log > > ID 277 gen 42 top level 257 path @/var/opt > > ID 278 gen 220 top level 257 path @/var/spool > > ID 279 gen 221 top level 257 path @/var/tmp
(In reply to Chenzi Cao from comment #19) > > > > Chenzi, can you check how the base system was created? To unblock the test, > > it's better to figure out how the @/var was created in the testing > > environment. btw. you can refer the regular SLE-12-SP5 btrfs layout by > > installing it and check "btrfs subvolume list /": > > Hi Yifan, the base system image is created by autoyast installation: > https://openqa.suse.de/tests/13577200# > Its autoyast profile: > https://openqa.suse.de/tests/13577200/settings/yam/autoyast/support_images/ > sles12sp5_install_default_patterns_x86_64.xml > > And I'll check the autoyast profile ASAP. > I updated the autoyast profile, the issue is gone, see the test results: https://openqa.suse.de/tests/13602129# (autoyast installation job to create image by updated profile) https://openqa.suse.de/tests/13602160#step/snapper_rollback/1 (migration job to verify the image, no problems in rollback) I'll submit the PR to modify the autoyast profile, and thanks your efforts on this issue.
Thanks, closing.
Verify here, after the autoyast profile is updated, I rerun the related jobs, all pass now: https://openqa.suse.de/tests/13603170#step/snapper_rollback/1 https://openqa.suse.de/tests/13604338#step/snapper_rollback/1