Bug 1221779

Summary: [xen-pv] Install VM, guest vnc console 'has no graphic display device' with 'grub2-x86_64-xen-2.06-150500.29.19.1.noarch' installed on xen host
Product: [openSUSE] openSUSE Distribution Reporter: Richard Fan <richard.fan>
Component: XenAssignee: Michael Chang <mchang>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P1 - Urgent CC: bootloader-maintainers, jcao, mchang, mgrifalconi, rtsvetkov, santiago.zarate, volkan.oztuzun, zcjia
Version: Leap 15.5   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Leap 15.5   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: screen output with error messages

Description Richard Fan 2024-03-21 02:01:25 UTC
Xen host information:

#/usr/lib/grub2/x86_64-xen> cat /etc/*release
NAME="openSUSE Leap"
VERSION="15.5"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.5"
PRETTY_NAME="openSUSE Leap 15.5"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.5"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Leap"
LOGO="distributor-logo-Leap"

# rpm -qi grub2-x86_64-xen-2.06-150500.29.19.1.noarch
Name        : grub2-x86_64-xen
Version     : 2.06
Release     : 150500.29.19.1
Architecture: noarch
Install Date: Fri 15 Mar 2024 03:25:28 AM CET
Group       : System/Boot
Size        : 21336931
License     : GPL-3.0-or-later
Signature   : RSA/SHA256, Thu 29 Feb 2024 12:22:41 PM CET, Key ID 70af9e8139db7c82
Source RPM  : grub2-2.06-150500.29.19.1.src.rpm
Build Date  : Thu 29 Feb 2024 12:21:27 PM CET
Build Host  : h04-ch1c
Relocations : (not relocatable)
Packager    : https://www.suse.com/
Vendor      : SUSE LLC <https://www.suse.com/>
URL         : http://www.gnu.org/software/grub/
Summary     : Bootloader with support for Linux, Multiboot and more
Description :
The GRand Unified Bootloader (GRUB) is a highly configurable and customizable
bootloader with modular architecture.  It supports rich variety of kernel formats,
file systems, computer architectures and hardware devices.  This subpackage
provides support for XEN systems.
Distribution: SUSE Linux Enterprise 15


# rpm -qf /usr/lib/grub2/x86_64-xen/grub.xen
grub2-x86_64-xen-2.06-150500.29.19.1.noarch
-----------------------------------------------

## Bug Description 

When install VM on this xen host with "xen-pv" mode. the vnc console fails to show installation gui console and reports error messages "This VM has no graphic display device".

Sample xen vm configuration file:

  <?xml version="1.0"?>
  <domain type="xen">
    <name>openQA-SUT-8</name>
    <description>openQA WebUI: openqa.suse.de (8): 13823703-sle-15-SP5-Online-QR-x86_64-Build137.7-ext4@svirt-xen-pv</description>
    <memory unit="MiB">1024</memory>
    <vcpu>1</vcpu>
    <os>
      <type>linux</type>
      <kernel>/usr/lib/grub2/x86_64-xen/grub.xen</kernel>
      <boot dev="cdrom"/>
    </os>
    <features>
      <acpi/>
      <apic/>
      <pae/>
    </features>
    <devices>
      <disk type="file" device="cdrom">
        <driver name="qemu" type="raw"/>
        <target dev="sda" bus="scsi"/>
        <source file="/var/lib/libvirt/images/SLE-15-SP5-Online-x86_64-Build137.7-Media1.iso"/>
      </disk>
      <disk type="file" device="disk">
        <driver name="qemu" type="qcow2" cache="unsafe"/>
        <target dev="xvdb" bus="xen"/>
        <source file="/var/lib/libvirt/images/openQA-SUT-8b.img"/>
      </disk>
      <console type="pty">
        <target type="xen" port="0"/>
      </console>
      <graphics type="vnc" port="5908" autoport="no" listen="0.0.0.0" sharePolicy="force-shared" passwd="xxxxxx">
        <listen type="address" address="0.0.0.0"/>
      </graphics>
      <interface type="bridge">
        <virtualport type="openvswitch"/>
        <model type="netfront"/>
        <mac address="00:16:3e:xx:xx:xx"/>
        <source bridge="br0"/>
      </interface>
    </devices>
    <on_reboot>destroy</on_reboot>
  </domain>

Then, we can see many installation failures in our automation tests like below:

https://openqa.suse.de/tests/13823703#step/setup_libyui/1 [please see attached file as well]

I did some investigation and found that the issue might be caused by xen kernel binary file '/usr/lib/grub2/x86_64-xen/grub.xen'. the issue is gone if I use the one from older package 'grub2-x86_64-xen-2.06-150500.29.13.1.noarch.rpm', please see job https://openqa.suse.de/tests/13838259#step/setup_libyui/1.


Can you please check if any update of package grub2-x86_64-xen cause the problem?

Do let me know if any information required.
Comment 1 Richard Fan 2024-03-21 02:02:01 UTC
Created attachment 873677 [details]
screen output with error messages
Comment 2 Jia Zhaocong 2024-03-21 02:09:08 UTC
The serial log shows that "memdisk" is the default boot option:

*memdisk Boot From Hard Disk (/boot/grub/grub.cfg)

It should be "xen/sda SUSE Install".

The latest changelog on grub2 is (https://build.suse.de/request/show/322355):

-------------------------------------------------------------------
Thu Feb 22 04:19:21 UTC 2024 - Michael Chang <mchang@suse.com>

- Fix grub.xen memdisk script doesn't look for /boot/grub/grub.cfg
  (bsc#1219248) (bsc#1181762) 
  * grub2-xen-pv-firmware.cfg
  * 0001-disk-Optimize-disk-iteration-by-moving-memdisk-to-th.patch


I think this is related.  CCing developer.
Comment 3 Michael Chang 2024-03-21 03:13:56 UTC
(In reply to Jia Zhaocong from comment #2)
> The serial log shows that "memdisk" is the default boot option:
> 
> *memdisk Boot From Hard Disk (/boot/grub/grub.cfg)
> 
> It should be "xen/sda SUSE Install".
> 
> The latest changelog on grub2 is (https://build.suse.de/request/show/322355):
> 
> -------------------------------------------------------------------
> Thu Feb 22 04:19:21 UTC 2024 - Michael Chang <mchang@suse.com>
> 
> - Fix grub.xen memdisk script doesn't look for /boot/grub/grub.cfg
>   (bsc#1219248) (bsc#1181762) 
>   * grub2-xen-pv-firmware.cfg
>   * 0001-disk-Optimize-disk-iteration-by-moving-memdisk-to-th.patch
> 
> 
> I think this is related.  CCing developer.

Yes. The change will look for /boot/grub/grub.cfg and the local disk will take precedence than memdisk in the result. However it did not take into account local disk boot entry always precedes cdrom, so this memdisk result should be ignored for not creating invalid local entry .

I'll work on this next week, presumably Monday,  as I am have other issues going on. Let me know if any deadline so I can re-prioritize that. Thanks.
Comment 4 Volkan OZTUZUN 2024-03-21 11:01:06 UTC
Hi Michael, This is mainly affecting QU3 RC1 at the moment. The release date (dead line) would be 09.04 at the moment. Therefore I believe it would be fine if you could start working on it next week. We could check on Tuesday and discuss things further if needed. Thank you.
Comment 5 Radoslav Tzvetkov 2024-03-21 11:02:08 UTC
Also, the RC phase of 15 SP6. So I increased the priority and the severity.
Comment 6 Michael Chang 2024-03-22 06:35:01 UTC
The SR of the fix has been created to these code streams

openSUSE: https://build.opensuse.org/request/show/1160540
SLE-15-SP6:GA:  https://build.suse.de/request/show/324550
SLE-15-SP5:Update: https://build.suse.de/request/show/324551

Setting resolution to "fixed" for verification!
Comment 7 Michael Chang 2024-03-27 02:09:43 UTC
To test the fix, you'll have to update the grub2-x86_64-xen in the xen host, the /usr/lib/grub2/x86_64-xen/grub.xen used by libvirt will then be updated with the new fix.
Comment 9 Michael Grifalconi 2024-03-27 08:08:10 UTC
Looks like the issue is still present on 15 SP6 https://openqa.suse.de/tests/13876646
Comment 10 Michael Grifalconi 2024-03-27 08:32:55 UTC
Nervermind, please disregard comment #9 

I now understand the problem was in the xen host running the test, so the new build will fail until the update reaches the openQA worker (and does not depend on the new build being tested now).

Richard already confirmed that the fix is working for the host that is running 15.5 and we can only assume the same fix works for 15.6 since we have no openQA workers running 15.6 I think.

Thanks Richard Fan for helping me better understand the situation!
Comment 12 Maintenance Automation 2024-03-27 20:30:07 UTC
SUSE-RU-2024:1013-1: An update that has one fix can now be installed.

Category: recommended (moderate)
Bug References: 1221779
Maintenance Incident: [SUSE:Maintenance:33118](https://smelt.suse.de/incident/33118/)
Sources used:
openSUSE Leap 15.5 (src):
 grub2-2.06-150500.29.22.2
SUSE Linux Enterprise Micro 5.5 (src):
 grub2-2.06-150500.29.22.2
Basesystem Module 15-SP5 (src):
 grub2-2.06-150500.29.22.2
Server Applications Module 15-SP5 (src):
 grub2-2.06-150500.29.22.2

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.