Bug 1226279 - kernel panic in systemd-testsuite VM after switch_root
Summary: kernel panic in systemd-testsuite VM after switch_root
Status: RESOLVED INVALID
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: KVM (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: systemd maintainers
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-06-13 10:26 UTC by Thomas Blume
Modified: 2024-07-02 07:50 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
fabiano.rosas: needinfo?


Attachments
qemu run log with panic -cpu IvyBridge (43.34 KB, text/plain)
2024-06-13 10:29 UTC, Thomas Blume
Details
qemu run log without panic -cpu Haswell (47.98 KB, text/plain)
2024-06-13 10:30 UTC, Thomas Blume
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Blume 2024-06-13 10:26:21 UTC
Running the systemd-testsuite with next systemd update from: 

https://download.opensuse.org/repositories/home:/fbui:/systemd:/v256/openSUSE_Tumbleweed/

results in a kernel panic after switch root:

-->
[  OK  ] Reached target Switch Root.
         Starting Switch Root...
[   33.072568][  T173] systemd-journald[173]: Received SIGTERM from PID 1 (systemd).
[   33.505266][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
[   33.505803][    T1] CPU: 0 PID: 1 Comm: systemd Not tainted 6.9.3-1-default #1 openSUSE Tumbleweed fda7b62a76dc21b6c4b75014796eda70ffed4444
[   33.506381][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
[   33.506841][    T1] Call Trace:
[   33.507470][    T1]  <TASK>
[   33.507702][    T1]  dump_stack_lvl+0x5a/0x80
[   33.508279][    T1]  panic+0x10b/0x2be
[   33.508419][    T1]  do_exit.cold+0x14/0x14
[   33.508555][    T1]  ? __slab_free+0xbf/0x2a0
[   33.508700][    T1]  do_group_exit+0x30/0x80
[   33.508856][    T1]  __x64_sys_exit_group+0x18/0x20
[   33.509033][    T1]  do_syscall_64+0x82/0x170
[   33.509175][    T1]  ? __memcg_slab_free_hook+0xef/0x140
[   33.509345][    T1]  ? update_load_avg+0x7e/0x7e0
[   33.509506][    T1]  ? update_load_avg+0x7e/0x7e0
[   33.509659][    T1]  ? set_next_entity+0xd0/0x190
[   33.509809][    T1]  ? finish_task_switch.isra.0+0x99/0x2e0
[   33.509986][    T1]  ? __schedule+0x3d0/0x15b0
[   33.510131][    T1]  ? rcu_core+0x1f2/0x4b0
[   33.510267][    T1]  ? lapic_next_event+0x15/0x20
[   33.510424][    T1]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   33.510729][    T1] RIP: 0033:0x7fd4c32ca1c5
[   33.511135][    T1] Code: 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 90 90 ba e7 00 00 00 eb 06 0f 1f 44 00 00 f4 89 d0 0f 05 <48> 3d 00 f0 ff ff 76 f3 f7 d8 89 05 5b 49 01 00 eb e9 66 0f 1f 84
[   33.511740][    T1] RSP: 002b:00007ffd5cbfa068 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
[   33.512374][    T1] RAX: ffffffffffffffda RBX: 00007fd4c32d33c8 RCX: 00007fd4c32ca1c5
[   33.512623][    T1] RDX: 00000000000000e7 RSI: 00007ffd5cbf9730 RDI: 000000000000007f
[   33.512873][    T1] RBP: 00007fd4c323eaaf R08: 0000000000000001 R09: ffffffffffffffff
[   33.513112][    T1] R10: 00007fd4c329f000 R11: 0000000000000202 R12: 0000000000000002
[   33.513351][    T1] R13: 000000000000000f R14: 00007fd4c323eac0 R15: 0000000000000000
[   33.513625][    T1]  </TASK>
[   33.514204][    T1] Kernel Offset: 0x28a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   33.514734][    T1] Rebooting in 90 seconds..
--<

The panic doesn't appear when adding a -cpu parameter with model Haswell or newer (see https://www.qemu.org/docs/master/system/qemu-cpu-models.html section:
"Preferred CPU models for Intel x86 hosts" for details), e.g.:


/usr/bin/qemu-system-x86_64 -m 1024M -nographic -cpu Haswell -kernel /boot/vmlinuz-6.9.3-1-default -initrd /boot/initrd-6.9.3-1-default -drive format=raw,cache=unsafe,file=/var/tmp/systemd-tests/default.img -append 'root=LABEL=systemd_boot console=ttyS0'

succeeds, but:

/usr/bin/qemu-system-x86_64 -m 1024M -nographic -cpu IvyBridge -kernel /boot/vmlinuz-6.9.3-1-default -initrd /boot/initrd-6.9.3-1-default -drive format=raw,cache=unsafe,file=/var/tmp/systemd-tests/default.img -append 'root=LABEL=systemd_boot console=ttyS0'

panics.

Reproducer:

1. Install systemd and systemd-testsuite from: 

https://download.opensuse.org/repositories/home:/fbui:/systemd:/v256/openSUSE_Tumbleweed/

2. go to testsuite directory and create the test image:

cd /usr/lib/systemd/tests/integration-tests/
NO_BUILD=1 make -C TEST-02-UNITTESTS QEMU_TIMEOUT=90 clean setup

3. run the above qemu-system-x86_64 command with a -cpu parameter IvyBridge or older
Comment 1 Thomas Blume 2024-06-13 10:29:53 UTC
Created attachment 875461 [details]
qemu run log with panic -cpu IvyBridge
Comment 2 Thomas Blume 2024-06-13 10:30:59 UTC
Created attachment 875462 [details]
qemu run log without panic -cpu Haswell
Comment 3 Thomas Blume 2024-06-13 10:33:03 UTC
The panic also happens when the -cpu parameter is omitted
Comment 4 Fabiano Rosas 2024-06-14 19:02:00 UTC
I'm having difficulty reproducing this. Can't get past this point:

[   14.652047][  T326] sd 0:0:0:0: [sda] Preferred minimum I/O size 512 bytes
[   14.684284][   T10] scsi 1:0:0:0: Attached scsi generic sg1 type 5
[   14.725152][  T326]  sda: sda1 sda2 sda3 sda4
         Starting dracut initqueue hook...
[  OK  ] Stopped Virtual Console Setup.
[   14.995020][  T312] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[   14.997164][  T312] cdrom: Uniform CD-ROM driver Revision: 3.20
         Stopping Virtual Console Setup...
         Starting Virtual Console Setup...
[  OK  ] Finished Virtual Console Setup.
[  OK  ] Reached target System Initialization.
[  OK  ] Reached target Basic System.

Dracut timeout starts after a while. Any suggestions? This happens with KVM as well, so it's not just emulation slowness.


Also, that panic looks like systemd just died. It doesn't seem to be something kernel-level or virtualization related. So perhaps we could try to isolate what exactly is killing systemd and then see how that relates to the emulated cpu. Would rd.debug be of any help? Or maybe drop to the dracut shell and switch root manually?
Comment 5 Franck Bui 2024-06-17 12:35:38 UTC
I think I've narrowed this issue down.

Depending on the model passed to '-cpu' option, the path of the shared library "libcrypto.so" that the linker loads when systemd is executed varies:

When it fails:

  # ldd /sysroot/usr/lib/systemd/systemd
      libcrypto.so.3 => /lib64/libcrypto.so.3

When it succeeds:

  # ldd /sysroot/usr/lib/systemd/systemd
      libcrypto.so.3 => /lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4

Apparently the dynamic loader chooses the version of the library to loads based on some of the features supported by the cpu.

However when the test image is built, the library is installed in the image based on the output of ldd running on systemd binary installed on the *host* (from where the VM running the testsuite is spawned). Hence the path of libcrypto is based on the features supported by the host cpu (and in my case it's /lib64/glibc-hwcaps/x86-64-v3/).

But when the VM is spawned, the features supported by the cpu emulated by the VM may differ from the host ones, which explains why in some situations the dynamic linker looks for libcrypto at the wrong place.
Comment 6 Franck Bui 2024-06-17 12:40:08 UTC
Fabiano, do you know whether one possible fix would be to pass "-cpu host" to qemu so the difference between the cpus on the host and on the VM  would be minimal ?
Comment 7 Fabiano Rosas 2024-06-17 13:50:57 UTC
(In reply to Franck Bui from comment #6)
> Fabiano, do you know whether one possible fix would be to pass "-cpu host"
> to qemu so the difference between the cpus on the host and on the VM  would
> be minimal ?

It would work, however that option is restricted to use along with KVM acceleration, which may not be available when you run the tests. The examples from comment #0, for instance are running entirely emulated.

If the problem is simply that the host cpu has _more_ features than the guest cpu in the test, then the -cpu max option might be useful. It should enable all implemented features.

Another workaround from the virtualization tools perspective would be to pin-point which feature(s) is causing the issue and try to craft a compatible -cpu option by disabling some features. However that might become not be portable across host machines.
Comment 8 Thomas Blume 2024-06-17 14:00:46 UTC
(In reply to Fabiano Rosas from comment #7)
> (In reply to Franck Bui from comment #6)
> > Fabiano, do you know whether one possible fix would be to pass "-cpu host"
> > to qemu so the difference between the cpus on the host and on the VM  would
> > be minimal ?
> 
> It would work, however that option is restricted to use along with KVM
> acceleration, which may not be available when you run the tests. The
> examples from comment #0, for instance are running entirely emulated.
> 
> If the problem is simply that the host cpu has _more_ features than the
> guest cpu in the test, then the -cpu max option might be useful. It should
> enable all implemented features.
> 
> Another workaround from the virtualization tools perspective would be to
> pin-point which feature(s) is causing the issue and try to craft a
> compatible -cpu option by disabling some features. However that might become
> not be portable across host machines.

Not sure if this is important but just to mention it, the "host" cpu in the openQA testcase is emulated too, e.g. the openQA Machine is also a vm.
So qemu runs nested here.
Comment 9 Fabiano Rosas 2024-06-17 14:41:08 UTC
(In reply to Thomas Blume from comment #8)
> Not sure if this is important but just to mention it, the "host" cpu in the
> openQA testcase is emulated too, e.g. the openQA Machine is also a vm.
> So qemu runs nested here.

Yes, that's relevant. It gives us more options.

Looking at the systemd tests I see a TEST_NESTED_KVM option. Has that been tried? It seems it would run the test guest as a proper nested KVM guest (whereas currently we're merely running an emulated guest inside of a VM). A nested guest would have a cpu with the same feature set as the level 1 guest.
Comment 10 Thomas Blume 2024-06-18 07:10:20 UTC
(In reply to Fabiano Rosas from comment #9)
> (In reply to Thomas Blume from comment #8)
> > Not sure if this is important but just to mention it, the "host" cpu in the
> > openQA testcase is emulated too, e.g. the openQA Machine is also a vm.
> > So qemu runs nested here.
> 
> Yes, that's relevant. It gives us more options.
> 
> Looking at the systemd tests I see a TEST_NESTED_KVM option. Has that been
> tried? It seems it would run the test guest as a proper nested KVM guest
> (whereas currently we're merely running an emulated guest inside of a VM). A
> nested guest would have a cpu with the same feature set as the level 1 guest.

Indeed, that works as it sets QEMU_KVM which in turn enables '-cpu host'

-->
    # Let's use KVM if possible
    if [[ -c /dev/kvm ]] && get_bool $QEMU_KVM; then
        qemu_options+=(-machine "accel=kvm" -enable-kvm -cpu host)
    fi
--<

Still, wouldn't that mean qemu cannot run a systemd based VM on any machine with a CPU older than Haswell?
Comment 11 Fabiano Rosas 2024-06-18 18:15:45 UTC
(In reply to Thomas Blume from comment #10)
> Still, wouldn't that mean qemu cannot run a systemd based VM on any machine
> with a CPU older than Haswell?

Per Franck Bui's analysis I understand this issue exists due to the particular way systemd builds the test image, so only that specific scenario should be affected.

I'd be glad to continue investigating this, but at the moment I don't see indication that we have an issue anywhere in the virtualization stack.
Comment 12 Thomas Blume 2024-06-19 12:50:12 UTC
(In reply to Fabiano Rosas from comment #11)
> (In reply to Thomas Blume from comment #10)
> > Still, wouldn't that mean qemu cannot run a systemd based VM on any machine
> > with a CPU older than Haswell?
> 
> Per Franck Bui's analysis I understand this issue exists due to the
> particular way systemd builds the test image, so only that specific scenario
> should be affected.
> 
> I'd be glad to continue investigating this, but at the moment I don't see
> indication that we have an issue anywhere in the virtualization stack.

Indeed, I misinterpreted Francks comment, sorry.
This is a testsuite problem, hence closing the bug.