Bug 1217903 - Only one vCPU in VM on RK3588
Summary: Only one vCPU in VM on RK3588
Status: REOPENED
Alias: None
Product: openSUSE Leap Micro
Classification: openSUSE
Component: Kernel:Drivers (show other bugs)
Version: 5.5
Hardware: Other Other
: P5 - None : Normal
Target Milestone: ---
Assignee: Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-08 08:34 UTC by Felix Niederwanger
Modified: 2024-06-14 15:36 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg (57.68 KB, text/plain)
2024-02-06 08:34 UTC, Felix Niederwanger
Details
dmesg of the hypervisor (61.55 KB, text/plain)
2024-02-08 08:08 UTC, Felix Niederwanger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Niederwanger 2023-12-08 08:34:21 UTC
I'm running openSUSE Leap Micro 5.5 (Kernel 5.14.21-150500.55.36-default) on a Radxa RockPi 5B (Rockchip RK3588) SOC. When I spin up a VM, the VM can only have one vCPU. I've tested this with a Leap and a MicroOS guest and both run into the same issue at VM startup:

> [    0.003395][    T1] psci: failed to boot CPU1 (-22)
> [    0.003409][    T1] CPU1: failed to boot: -22

And then after some time the system boots, but only with one CPU:

# cat /proc/cpuinfo
processor    : 0
BogoMIPS    : 48.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdh
p cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x4
CPU part    : 0xd0b
CPU revision    : 0

I also observe that one full core is being hogged on the hypervisor, or more specifically by the qemu-system-aarch64 process therein.

I do not observe the same issue when running a fresh Tumbleweed Live system. There I can run a Tumbleweed VM with 2 cores without any issues, so this appears to be limited to Leap Micro 5.5.
Comment 1 Guillaume GARDET 2024-01-08 13:15:15 UTC
Could you provide the full dmesg from host after you reproduced the issue, please?
Comment 2 Felix Niederwanger 2024-02-06 08:24:20 UTC
The issue appears to be resolved in the meantime, so I'm unable to provide the necessary information.

I just updated the test VM and got now two CPU just as expected.

Closing ticket as resolved/fixed.
Comment 3 Felix Niederwanger 2024-02-06 08:33:34 UTC
Wait, reopening. This issue is not resolved. I could just reproduce the issue again when rebooting the VM with 4 vCPUs. One CPU failed to boot:

> [    0.005075] smp: Bringing up secondary CPUs ...
> [    0.030275] Detected VIPT I-cache on CPU1
> [    0.030353] GICv3: CPU1: found redistributor 1 region 0:0x00000000080c0000
> [    0.030548] GICv3: CPU1: using allocated LPI pending table @0x00000000401f0000
> [    0.030695] CPU1: Booted secondary processor 0x0000000001 [0x412fd050]
> [    0.043083] CPU features: detected: Spectre-v4
> [    0.043139] Detected PIPT I-cache on CPU2
> [    0.043182] GICv3: CPU2: found redistributor 2 region 0:0x00000000080e0000
> [    0.043288] GICv3: CPU2: using allocated LPI pending table @0x0000000040200000
> [    0.043370] CPU2: Booted secondary processor 0x0000000002 [0x414fd0b0]
> [    0.044283] psci: failed to boot CPU3 (-22)
> [    0.044310] CPU3: failed to boot: -22
> [    0.044388] smp: Brought up 1 node, 3 CPUs
> [    0.044397] SMP: Total of 3 processors activated.

The RK3588 is a big.LITTLE chip. Could it be that depending on the choice of the target CPU core, it will fail or succeed?
Comment 4 Felix Niederwanger 2024-02-06 08:34:37 UTC
Created attachment 872489 [details]
dmesg

Attaching the requested dmesg log from the VM.
Comment 5 Ivan Ivanov 2024-02-07 18:33:13 UTC
Could you also attach kernel log from the host, please?
Comment 6 Felix Niederwanger 2024-02-08 08:08:52 UTC
Created attachment 872570 [details]
dmesg of the hypervisor

Of course! I didn't do it because it didn't show anything meaningful to me.
Comment 7 Felix Niederwanger 2024-03-25 08:10:57 UTC
For the record - I tried in the meantime to pin the VM in question to either the BIG or the little cores and in both cases the issue remained.

From libvirt:

>   <cpu mode="host-passthrough" check="none">
>     <topology sockets="1" dies="1" cores="2" threads="1"/>
>   </cpu>

I also tried various variations of CPU cores using e.g.

> <vcpu placement='static'>2</vcpu>
> <cputune>
>     <vcpupin vcpu='0' cpuset='0'/>
>     <vcpupin vcpu='1' cpuset='1'/>
> </cputune>

In all configuration I tested (only LITTLE, only BIG, mixed little.BIG cores) I could observe the above stated issue.
Comment 8 Guillaume GARDET 2024-03-26 11:18:18 UTC
It seems to be https://gitlab.com/qemu-project/qemu/-/issues/1595
which should be fix on the host kernel with https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7af0c2534f4c57b16e
Comment 9 Guillaume GARDET 2024-03-26 11:21:51 UTC
(In reply to Guillaume GARDET from comment #8)
> It seems to be https://gitlab.com/qemu-project/qemu/-/issues/1595
> which should be fix on the host kernel with
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?id=7af0c2534f4c57b16e

FYI, this is part of 6.3+ kernel, so Tumbleweed and Leap 15.6/SLE15-SP6 are safe.