Bug 1215516

Summary: kernel: RT requires X86_FEATURE_CONSTANT_TSC
Product: [openSUSE] PUBLIC SUSE Linux Enterprise Micro 5.5 Reporter: Jose Lausuch <jalausuch>
Component: BaseAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: NEW --- QA Contact: Jiri Srain <jsrain>
Severity: Normal    
Priority: P2 - High CC: ddavis, fvogt, jalausuch, lubos.kocman, richard.fan
Version: unspecified   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: leap_micro_journal
journal_after_boot
modprobe kvm_intel
modinfo kvm_intel

Description Jose Lausuch 2023-09-20 11:49:42 UTC
Created attachment 869615 [details]
leap_micro_journal

openQA test caught this error in journal after each reboot.
https://openqa.opensuse.org/tests/3587516#step/journal_check/54

> Sep 19 19:57:45.679141 localhost kernel: ppdev: user-space parallel port driver
> Sep 19 19:57:45.699835 localhost systemd[1]: Mounting /boot/writable...
> Sep 19 19:57:45.710894 localhost kernel: BTRFS info (device vda3): disk space caching is enabled
> Sep 19 19:57:45.716955 localhost systemd[1]: Mounting /home...
> Sep 19 19:57:45.735904 localhost systemd[1]: Mounting /opt...
> Sep 19 19:57:45.744443 localhost systemd[1]: Mounting /srv...
> Sep 19 19:57:45.763269 localhost systemd[1]: Mounting /usr/local...
> Sep 19 19:57:45.784965 localhost systemd[1]: Mounted /.snapshots.
> Sep 19 19:57:45.785391 localhost systemd[1]: Mounted /boot/grub2/i386-pc.
> Sep 19 19:57:45.803290 localhost systemd[1]: Mounted /boot/grub2/x86_64-efi.
> Sep 19 19:57:45.803463 localhost systemd[1]: Mounted /boot/writable.
> Sep 19 19:57:45.803617 localhost systemd[1]: Mounted /home.
> Sep 19 19:57:45.803772 localhost systemd[1]: Mounted /opt.
> Sep 19 19:57:45.815497 localhost systemd[1]: Starting Relabel .snapshots...
> Sep 19 19:57:45.823268 localhost systemd[1]: Starting Relabel boot/grub2/i386-pc...
> Sep 19 19:57:45.843869 localhost kernel: RT requires X86_FEATURE_CONSTANT_TSC
> Sep 19 19:57:45.851668 localhost systemd[1]: Starting Relabel boot/grub2/x86_64-efi...
> Sep 19 19:57:45.855177 localhost systemd[1]: Starting Relabel boot/writable...
> Sep 19 19:57:45.871184 localhost systemd[1]: Starting Relabel home...
> Sep 19 19:57:45.892265 localhost systemd[1]: Starting Relabel opt...
Comment 1 Lubos Kocman 2023-09-25 11:26:27 UTC
Jiri mentioned that this is probably because the CPU of the machine is too old.
Comment 2 Lubos Kocman 2023-09-25 11:27:21 UTC
If not then this should be handeled by kernel team.
Comment 3 Jose Lausuch 2023-09-25 12:58:24 UTC
It could be. This is the history where the RT image has been executed:
- openqaworker4  -> Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
- openqaworker6  -> Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
- openqaworker20 -> AMD EPYC 7543 32-Core Processor
- openqaworker22 -> AMD EPYC 7773X 64-Core Processor
- openqaworker24 -> AMD EPYC 7773X 64-Core Processor

The errors were only shown in the AMD EPYC processors... so that could be the culprit. 

We can do 2 things:
- If X86_FEATURE_CONSTANT_TSC message is not a problem in AMD processores, we can ignore the bug in the tests.
- If the bug is valid, but the RT image is restricted to certain CPUs, I can force the execution on the Intel workers. 

Any thoughts?
Comment 4 Lubos Kocman 2023-09-25 12:59:44 UTC
This is something we should probably check on with Intel and AMD TAMs. Definitely worth a release note if that is the case.
Comment 10 Lubos Kocman 2023-09-29 08:10:05 UTC
Hello team!

I did ask QE for further information. QE is bit under pressure since they operate on three projects with overlapping schedule. I expect that we could have additional information within one week. Sorry for the delay.
Comment 12 Jose Lausuch 2023-09-29 09:25:02 UTC
Created attachment 869835 [details]
journal_after_boot

Full Journal after booting the VM.
Comment 13 Richard Fan 2024-01-12 08:50:38 UTC
I can hit the issue with slem-6.0 now, error messages like below:

Jan 12 07:34:49.300627 localhost kernel: kvm: RT requires X86_FEATURE_CONSTANT_TSC

sample job:
https://openqa.suse.de/tests/13239389#step/journal_check/22

Here comes the host cpu information:

CPU_FLAGS: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca

CPU_MODELNAME:	AMD EPYC 7773X 64-Core Processor

CPU_OPMODE:	32-bit, 64-bit
Comment 14 Richard Fan 2024-01-12 09:46:24 UTC
More update on slem 6.0 test. the issue is gone on setup with AMD cpu after I changed the qemu parameter:
1. Added `-cpu host,invtsc=on` 
2. Removed '-only-migratable'

And based on my current tests, the issue is not seen on setup with INTEL cpu.

OpenQA jobs are list here - http://openqa.suse.de/tests/overview?distri=sle-micro&build=rfan0112-3&version=6.0
Comment 15 Fabian Vogt 2024-01-12 09:58:38 UTC
KVM in the RT kernel requires the "invtsc" CPUID flag, which sets the X86_FEATURE_CONSTANT_TSC cpu feature in the kernel. The openQA tests run QEMU with "-only-migratable -cpu host" which ends up with invtsc disabled because it's not compatible with snapshots. rfan was able to confirm that with -cpu host,invtsc=on, the error messages are gone.

Question is why kvm is being initialized at all during boot and why this only fails on AMD hosts.
Comment 16 Richard Fan 2024-01-15 01:39:06 UTC
(In reply to Fabian Vogt from comment #15)
> KVM in the RT kernel requires the "invtsc" CPUID flag, which sets the
> X86_FEATURE_CONSTANT_TSC cpu feature in the kernel. The openQA tests run
> QEMU with "-only-migratable -cpu host" which ends up with invtsc disabled
> because it's not compatible with snapshots. rfan was able to confirm that
> with -cpu host,invtsc=on, the error messages are gone.
> 
> Question is why kvm is being initialized at all during boot and why this
> only fails on AMD hosts.

Base on my test, I can find the issue [on slem 6.0] can only be seen with kernel 'base-rt' 

And no such issue with [kernel-base kernel-default], I don't know if different kernel versions cause this issue. but please let me know if any information required.

-----
[   32.493137][ T1754] kvm: RT requires X86_FEATURE_CONSTANT_TSC
Welcome to SUSE Linux Enterprise Micro 6.0 Beta1 (x86_64) - Kernel 6.4.0-1-rt (ttyS0).
Kernel 6.4.0-1-rt
-----
Comment 17 Fabian Vogt 2024-01-15 08:10:10 UTC
(In reply to Richard Fan from comment #16)
> (In reply to Fabian Vogt from comment #15)
> > KVM in the RT kernel requires the "invtsc" CPUID flag, which sets the
> > X86_FEATURE_CONSTANT_TSC cpu feature in the kernel. The openQA tests run
> > QEMU with "-only-migratable -cpu host" which ends up with invtsc disabled
> > because it's not compatible with snapshots. rfan was able to confirm that
> > with -cpu host,invtsc=on, the error messages are gone.
> > 
> > Question is why kvm is being initialized at all during boot and why this
> > only fails on AMD hosts.
> 
> Base on my test, I can find the issue [on slem 6.0] can only be seen with
> kernel 'base-rt' 
> 
> And no such issue with [kernel-base kernel-default],

That's expected, this is only with the RT kernel.

> I don't know if
> different kernel versions cause this issue. but please let me know if any
> information required.

It's still unclear to me why apparently KVM is getting initialized only on AMD but not intel hosts. Can you try to "modprobe kvm_intel" in a SLEM VM with RT kernel on an Intel host, where this message does not appear?
Comment 18 Richard Fan 2024-01-15 08:41:37 UTC
(In reply to Fabian Vogt from comment #17)
> (In reply to Richard Fan from comment #16)
> > (In reply to Fabian Vogt from comment #15)
> > > KVM in the RT kernel requires the "invtsc" CPUID flag, which sets the
> > > X86_FEATURE_CONSTANT_TSC cpu feature in the kernel. The openQA tests run
> > > QEMU with "-only-migratable -cpu host" which ends up with invtsc disabled
> > > because it's not compatible with snapshots. rfan was able to confirm that
> > > with -cpu host,invtsc=on, the error messages are gone.
> > > 
> > > Question is why kvm is being initialized at all during boot and why this
> > > only fails on AMD hosts.
> > 
> > Base on my test, I can find the issue [on slem 6.0] can only be seen with
> > kernel 'base-rt' 
> > 
> > And no such issue with [kernel-base kernel-default],
> 
> That's expected, this is only with the RT kernel.
> 
> > I don't know if
> > different kernel versions cause this issue. but please let me know if any
> > information required.
> 
> It's still unclear to me why apparently KVM is getting initialized only on
> AMD but not intel hosts. Can you try to "modprobe kvm_intel" in a SLEM VM
> with RT kernel on an Intel host, where this message does not appear?

I did that, please see attached file for output.
Comment 19 Richard Fan 2024-01-15 08:45:00 UTC
Created attachment 871870 [details]
modprobe kvm_intel
Comment 20 Richard Fan 2024-01-15 08:45:18 UTC
Created attachment 871871 [details]
modinfo kvm_intel
Comment 21 Fabian Vogt 2024-01-15 09:13:24 UTC
rfan provided access to a SLEM VM on an intel host, so I had a look. In that VM, the constant_tsc feature is visible in cpuinfo, even though invtsc is not set in the QEMU cpu flags. This is because the kernel unconditionally sets X86_FEATURE_CONSTANT_TSC on any newer Intel CPU regardless of the invtsc cpuid feature.

FWICT newer AMD CPUs also have a constant tsc, but there the kernel only looks at the cpuid flag, so -cpu host,invtsc=on is necessary.

Is this inconsistency intentional?