Bugzilla – Bug 1216472
VMs with secure boot do not start (assertion in edk2)
Last modified: 2023-11-21 03:11:22 UTC
Created attachment 870374 [details] VMI startup log One of the KubeVirt tests that runs a VM with Secure Boot fails. The debug logs show an assertion in edk2: ASSERT /home/abuild/rpmbuild/BUILD/edk2-edk2-stable202308/UefiCpuPkg/Library/BaseXApicX2ApicLib/BaseXApicX2ApicLib.c(1478): (Index != 0) || (LevelType == 0x01) Tested with ovmf build from the Virtualization project: qemu-ovmf-x86_64-202308-Virt.1699.256.29.noarch
Hi Vasily, (In reply to Vasily Ulyanov from comment #0) > Created attachment 87 [details] 0374 [details] > VMI startup log > > One of the KubeVirt tests that runs a VM with Secure Boot fails. The debug > logs show an assertion in edk2: > > ASSERT > /home/abuild/rpmbuild/BUILD/edk2-edk2-stable202308/UefiCpuPkg/Library/ > BaseXApicX2ApicLib/BaseXApicX2ApicLib.c(1478): (Index != 0) || (LevelType == > 0x01) > > Tested with ovmf build from the Virtualization project: > qemu-ovmf-x86_64-202308-Virt.1699.256.29.noarch I can not reproduce this issue on my local machine with qemu-ovmf + secure boot. My code/vars are: <loader readonly='yes' secure='no' type='pflash'>/usr/share/qemu/ovmf-x86_64-code.bin</loader> <nvram template='/usr/share/qemu/ovmf-x86_64-vars.bin'>/var/lib/libvirt/qemu/nvram/opensuseTW_VARS.fd</nvram>
(In reply to Vasily Ulyanov from comment #0) > Created attachment 870374 [details] > VMI startup log Base on the above log, looks that the VM used OVMF_CODE.secboot.fd and OVMF_VARS.secboot.fd as EFI firmware: {"component":"virt-launcher","level":"info","msg":"\t\t<loader readonly=\"yes\" secure=\"yes\" type=\"pflash\">/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>","subcomponent":"libvirt","timestamp":"2023-10-19T17:30:50.321164Z"} {"component":"virt-launcher","level":"info","msg":"\t\t<nvram template=\"/usr/share/OVMF/OVMF_VARS.secboot.fd\">/tmp/default_testvmi</nvram>","subcomponent":"libvirt","timestamp":"2023-10-19T17:30:50.321177Z"} But those files are not from qemu-ovmf-x86_64 RPM. Hi Vasily, Do you know where are OVMF_CODE.secboot.fd and OVMF_VARS.secboot.fd from?
(In reply to Joey Lee from comment #1) > > I can not reproduce this issue on my local machine with qemu-ovmf + secure > boot. > > My code/vars are: > > <loader readonly='yes' secure='no' > type='pflash'>/usr/share/qemu/ovmf-x86_64-code.bin</loader> > <nvram > template='/usr/share/qemu/ovmf-x86_64-vars.bin'>/var/lib/libvirt/qemu/nvram/ > opensuseTW_VARS.fd</nvram> I guess with `secure='no'` the VM will run **without** Secure Boot, right? This works fine for me as well. The issue is observed when `secure='yes'` and `ovmf-x86_64-smm-ms-code.bin` is used. I will grab the full domxml from the test run and attach it here. > Do you know where are OVMF_CODE.secboot.fd and OVMF_VARS.secboot.fd from? Those are just symlinks to `/usr/share/qemu/ovmf-x86_64-smm-ms-code.bin` and `/usr/share/qemu/ovmf-x86_64-smm-ms-vars.bin`: qemu@testvmi:/> ls -la /usr/share/OVMF/OVMF_CODE.secboot.fd lrwxrwxrwx 1 root root 35 Nov 1 01:52 /usr/share/OVMF/OVMF_CODE.secboot.fd -> ../qemu/ovmf-x86_64-smm-ms-code.bin qemu@testvmi:/> qemu@testvmi:/> ls -la /usr/share/OVMF/OVMF_VARS.secboot.fd lrwxrwxrwx 1 root root 35 Nov 1 01:52 /usr/share/OVMF/OVMF_VARS.secboot.fd -> ../qemu/ovmf-x86_64-smm-ms-vars.bin qemu@testvmi:/> qemu@testvmi:/> rpm -qa | grep ovmf qemu-ovmf-x86_64-202308-Virt.1699.256.44.noarc
Created attachment 870566 [details] Domain xml
(In reply to Vasily Ulyanov from comment #4) > Created attachment 870566 [details] > Domain xml May I know your libvirt version? I got a problem when I tested ms-ovmf image. I filed another issue bsc#1216789.
(In reply to Joey Lee from comment #5) > (In reply to Vasily Ulyanov from comment #4) > > Created attachment 870566 [details] > > Domain xml > > May I know your libvirt version? I got a problem when I tested ms-ovmf > image. I filed another issue bsc#1216789. Libvirt is from the Virtualization project in OBS. Should be the latest one: qemu@testvmi:/> rpm -qa | grep libvirt- system-group-libvirt-20170617-25.2.noarch libvirt-libs-9.8.0-Virt.1699.1088.7.x86_64 libvirt-daemon-log-9.8.0-Virt.1699.1088.7.x86_64 libvirt-client-9.8.0-Virt.1699.1088.7.x86_64 libvirt-daemon-common-9.8.0-Virt.1699.1088.7.x86_64 libvirt-daemon-driver-qemu-9.8.0-Virt.1699.1088.7.x86_64
I can NOT reproduce issue on my local machine. Maybe this issue relates to the hardware of host machine: My environment is: qemu-8.1.2-1.2.x86_64 libvirt-9.8.0-2.1.x86_64 qemu-ovmf-x86_64-202308-1.2.noarch The following configuration is success: <os firmware='efi'> <type arch='x86_64' machine='pc-q35-8.1'>hvm</type> <firmware> <feature enabled='yes' name='enrolled-keys'/> <feature enabled='yes' name='secure-boot'/> </firmware> <loader readonly='yes' secure='yes' type='pflash'>/usr/share/qemu/ovmf-x86_64-smm-ms-code.bin</loader> <nvram template='/usr/share/qemu/ovmf-x86_64-smm-ms-vars.bin'>/var/lib/libvirt/qemu/nvram/opensuseTW_VARS.fd</nvram> <boot dev='hd'/> </os> Because ovmf-x86_64-smm-ms-code.bin has default features setting in /usr/share/qemu/firmware/50-ovmf-x86_64-secure-ms.json, new libvirt auto-add 'enrolled-keys' and 'secure-boot' features to firmware section. It works fine. And, I have tried to create symlinks as comment#3, it still works on my local machines: <os> <type arch='x86_64' machine='pc-q35-8.1'>hvm</type> <loader readonly='yes' secure='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader> <nvram template='/usr/share/OVMF/OVMF_VARS.secboot.fd'>/var/lib/libvirt/qemu/nvram/opensuseTW_VARS.fd</nvram> <boot dev='hd'/> </os> Because OVMF_CODE.secboot.fd does NOT have features definition in any .json file. So features will not be auto-added by libvirt. But the above configuration is still works on my local machine with secure boot enabled.
Hm... Actually, KubeVirt tests are run on a VM. So it is a 'nested' scenario. Can this affect the issue somehow? Also, just to make sure, KubeVirt domain has: <features> <acpi/> <smm state="on"/> </features> Do you have those features enabled?
(In reply to Vasily Ulyanov from comment #8) > Hm... Actually, KubeVirt tests are run on a VM. So it is a 'nested' > scenario. Can this affect the issue somehow? > > Also, just to make sure, KubeVirt domain has: > > <features> > <acpi/> > <smm state="on"/> > </features> > > Do you have those features enabled? Yes, my xml has the following features: <features> <acpi/> <apic/> <smm state='on'/> <ioapic driver='qemu'/> </features>
Hi Vasily, Could you please try ovmf in SUSE:ALP:Source:Standard:1.0 repo on IBS? It's edk2-stable202305. It can help to narrow down the scope of edk2 version. Thanks a lot! Joey Lee
(In reply to Joey Lee from comment #10) > Hi Vasily, > > Could you please try ovmf in SUSE:ALP:Source:Standard:1.0 repo on IBS? It's > edk2-stable202305. It can help to narrow down the scope of edk2 version. > > Thanks a lot! > Joey Lee Hi Joey, the version from ALP works fine.
(In reply to Vasily Ulyanov from comment #11) > (In reply to Joey Lee from comment #10) > > Hi Vasily, > > > > Could you please try ovmf in SUSE:ALP:Source:Standard:1.0 repo on IBS? It's > > edk2-stable202305. It can help to narrow down the scope of edk2 version. > > > > Thanks a lot! > > Joey Lee > > Hi Joey, the version from ALP works fine. Thanks! Then the issue relates a change between edk2-stable202305..edk2stable202308. Could you please share how to build up your environment? or you can share the machine (or virtual machine) to me for debugging?
Hi Vasily (In reply to Vasily Ulyanov from comment #4) > Created attachment 870566 [details] > Domain xml Per your xml, the cpu mode is: <cpu mode="custom" match="exact" check="full"> <model fallback="forbid">SapphireRapids</model> <vendor>Intel</vendor> <topology sockets="1" dies="1" cores="1" threads="1"/> <feature policy="require" name="ss"/> <feature policy="require" name="vmx"/> <feature policy="require" name="pdcm"/> <feature policy="require" name="hypervisor"/> <feature policy="require" name="tsc_adjust"/> <feature policy="require" name="cldemote"/> <feature policy="require" name="movdiri"/> <feature policy="require" name="movdir64b"/> <feature policy="require" name="md-clear"/> <feature policy="require" name="stibp"/> <feature policy="require" name="ibpb"/> <feature policy="require" name="ibrs"/> <feature policy="require" name="amd-stibp"/> <feature policy="require" name="amd-ssbd"/> <feature policy="require" name="tsx-ctrl"/> <feature policy="require" name="sbdr-ssdp-no"/> <feature policy="require" name="fbsdp-no"/> <feature policy="require" name="psdp-no"/> <feature policy="disable" name="amx-bf16"/> <feature policy="disable" name="amx-tile"/> <feature policy="disable" name="amx-int8"/> <feature policy="disable" name="fzrm"/> <feature policy="disable" name="fsrs"/> <feature policy="disable" name="fsrc"/> <feature policy="disable" name="xfd"/> </cpu> Could you please also try to change the cpu mode to this? <cpu mode='host-model' check='partial'/> The above model is my setting. It works to me.
(In reply to Vasily Ulyanov from comment #0) > Created attachment 870374 [details] > VMI startup log > > One of the KubeVirt tests that runs a VM with Secure Boot fails. The debug > logs show an assertion in edk2: > > ASSERT > /home/abuild/rpmbuild/BUILD/edk2-edk2-stable202308/UefiCpuPkg/Library/ > BaseXApicX2ApicLib/BaseXApicX2ApicLib.c(1478): (Index != 0) || (LevelType == > 0x01) > The above ASSERT is from GetProcessorLocation2ByApicId() which is about processor. So my direction is the difference of CPU setting.
(In reply to Joey Lee from comment #13) > Hi Vasily > > (In reply to Vasily Ulyanov from comment #4) > > Created attachment 870566 [details] > > Domain xml > > Per your xml, the cpu mode is: > > <cpu mode="custom" match="exact" check="full"> > <model fallback="forbid">SapphireRapids</model> > <vendor>Intel</vendor> > <topology sockets="1" dies="1" cores="1" threads="1"/> > <feature policy="require" name="ss"/> > <feature policy="require" name="vmx"/> > <feature policy="require" name="pdcm"/> > <feature policy="require" name="hypervisor"/> > <feature policy="require" name="tsc_adjust"/> > <feature policy="require" name="cldemote"/> > <feature policy="require" name="movdiri"/> > <feature policy="require" name="movdir64b"/> > <feature policy="require" name="md-clear"/> > <feature policy="require" name="stibp"/> > <feature policy="require" name="ibpb"/> > <feature policy="require" name="ibrs"/> > <feature policy="require" name="amd-stibp"/> > <feature policy="require" name="amd-ssbd"/> > <feature policy="require" name="tsx-ctrl"/> > <feature policy="require" name="sbdr-ssdp-no"/> > <feature policy="require" name="fbsdp-no"/> > <feature policy="require" name="psdp-no"/> > <feature policy="disable" name="amx-bf16"/> > <feature policy="disable" name="amx-tile"/> > <feature policy="disable" name="amx-int8"/> > <feature policy="disable" name="fzrm"/> > <feature policy="disable" name="fsrs"/> > <feature policy="disable" name="fsrc"/> > <feature policy="disable" name="xfd"/> > </cpu> > > Could you please also try to change the cpu mode to this? > > <cpu mode='host-model' check='partial'/> > > The above model is my setting. It works to me. The attached dom.xml is the output of `virsh dumpxml`. Meaning that it is 'adjusted' by libvirt. The original dom.xml already specifies the CPU model as 'host-model'. So those CPU settings actually match the host hardware. I will grab and attach the original domain xml here for reference.
Created attachment 870592 [details] Domain xml (original)
Thanks for Vasily's help to setup environment for bisecting. After bisecting edk2-stable202305..edk2-stable202308, the winner is: From 1fadd18d0c0c65ffde9e128a486414ba43b3387c Mon Sep 17 00:00:00 2001 [edk2-stable202308] From: "Zhang, Hongbin1" <Hongbin1.Zhang@intel.com> Date: Mon, 29 May 2023 14:39:38 +0800 Subject: [PATCH 177/271] UefiCpuPkg: Get processor extended information for SmmCpuServiceProtocol Some features like RAS need to use processor extended information under smm, So add code to support it Signed-off-by: Hongbin1 Zhang <hongbin1.zhang@intel.com> Cc: Eric Dong <eric.dong@intel.com> Reviewed-by: Ray Ni <ray.ni@intel.com> Cc: Rahul Kumar <rahul1.kumar@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Star Zeng <star.zeng@intel.com> Reviewed-by: Jiaxin Wu <jiaxin.wu@intel.com> After reverted this patch, edk2-stable202308 ovmf works fine on issue machine. I will check why this change causes problem.
(In reply to Vasily Ulyanov from comment #0) > Created attachment 870374 [details] > VMI startup log > > One of the KubeVirt tests that runs a VM with Secure Boot fails. The debug > logs show an assertion in edk2: > > ASSERT > /home/abuild/rpmbuild/BUILD/edk2-edk2-stable202308/UefiCpuPkg/Library/ > BaseXApicX2ApicLib/BaseXApicX2ApicLib.c(1478): (Index != 0) || (LevelType == > 0x01) > I have checked the ASSERT code of in OVMF: UefiCpuPkg/Library/BaseXApicX2ApicLib/BaseXApicX2ApicLib.c:GetProcessorLocation2ByApicId // // first level reported should be SMT. // ASSERT ((Index != 0) || (LevelType == CPUID_EXTENDED_TOPOLOGY_LEVEL_TYPE_SMT)); if (LevelType == CPUID_EXTENDED_TOPOLOGY_LEVEL_TYPE_INVALID) { break; } Then I add debug log to print Index and LevelType on issue virtual machine kubevirt-ci-node72, it shows: GetProcessorLocation2ByApicId, Index: 0, LevelType: 0 ASSERT /home/joeyli/source_code-git/edk2/UefiCpuPkg/Library/BaseXApicX2ApicLib/BaseXApicX2ApicLib.c(1479): (Index != 0) || (LevelType == 0x01) The LevelType returned by cpuid instruction (CPUID V2 Extended Topology Enumeration Leaf) is CPUID_EXTENDED_TOPOLOGY_LEVEL_TYPE_INVALID. But Index 0 (first level) should with CPUID_EXTENDED_TOPOLOGY_LEVEL_TYPE_SMT. That's why the ASSERT be exposed. On issue machine, the MaxStandardCpuIdIndex is 31. It is >= CPUID_V2_EXTENDED_TOPOLOGY. So the logic of "CPUID V2 Extended Topology Enumeration Leaf" be used. But the cpuid should NOT return CPUID_EXTENDED_TOPOLOGY_LEVEL_TYPE_INVALID level type for the first level.
Hi James, Sorry for bother you! As my comment#18, do you know who returns CPUID_EXTENDED_TOPOLOGY_LEVEL_TYPE_INVALID level type by cpuid instruction? QEMU or host machine's CPU? Thanks!
I also filed a bug on tianocore bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4598
(In reply to Joey Lee from comment #20) > I also filed a bug on tianocore bugzilla: > > https://bugzilla.tianocore.org/show_bug.cgi?id=4598 I have followed Gerd Hoffmann's suggestion and confirmed that Gerd's 170d4ce8e9 patch in edk2 mainline works to me on issue machine for fixing problem: commit 170d4ce8e90abb1eff03852940a69c9d17f8afe5 Author: Gerd Hoffmann <kraxel@redhat.com> Date: Tue Oct 17 13:28:07 2023 +0200 UefiCpuPkg/BaseXApicX2ApicLib: fix CPUID_V2_EXTENDED_TOPOLOGY detection I have tested 2023-11-15 master branch and also backported the above patch to edk2-stable202308 ovmf. Both of them works. I will backoport 170d4ce8e9 patch to our edk2-stable202308 ovmf.
Backported 170d4ce8e9 patch be merged to openSUSE:Factory/ovmf. Set this issue to FIXED.