Bug 1220143

Summary: intermittently VM startup indefinitely stucks at "start job is running for wicked managed network interfaces"
Product: [openSUSE] PUBLIC SUSE Linux Enterprise Server 15 SP6 Reporter: fei wang <fei2.wang>
Component: BasesystemAssignee: wicked maintainers <wicked-maintainers>
Status: NEW --- QA Contact:
Severity: Major    
Priority: P2 - High CC: aginies, cfamullaconrad, claudio.fontana, dfaggioli, fei2.wang, lma, marc.ruehrschneck, mt, poswald, pragyansri.pathi, rtsvetkov, wicked-maintainers
Version: unspecifiedFlags: mt: needinfo? (fei2.wang)
Target Milestone: ---   
Hardware: x86-64   
OS: SLES 15   
Whiteboard:
Found By: Beta-Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: screenshot for the failure symptom

Description fei wang 2024-02-21 09:43:30 UTC
Background:

We are doing shift-left validation for SLES 15 SP6 Beta 3 for both host and VM(kernel version 6.4.0-150600.5-default) on intel server platform and hit this hard nut to crack.

Symptom Description:

VM cannot boot up properly and indefinitely stagnates on the following stage, and the VM doesn’t seem to hang completely because the cursor is still blinking, but ctrl+alt+del got no response and the elapsed time 28s doesn’t change and bootup process cannot proceed. I used to leave it there overnight and found it’s still there.



Reproduction Steps:

This issue is intermittent and may miraculously gone by itself. We hit this issue several times during running DPDK Test Suite VF cases which will repeatedly start SLES VM with Intel E810 VF assigned and shut it down, for roughly one hour. Basically DTS uses below command to start VM.

taskset -c 5,6,7,8 qemu-system-x86_64  -name vm0 -enable-kvm -pidfile /tmp/.vm0.pid -daemonize -monitor unix:/tmp/vm0_monitor.sock,server,nowait -device e1000,netdev=nttsip1  -netdev user,id=nttsip1,hostfwd=tcp:10.239.182.115:6000-:22 -device vfio-pci,host=0000:37:01.0,id=pt_0 -cpu host -smp 4 -m 10240 -chardev socket,path=/tmp/vm0_qga0.sock,server,nowait,id=vm0_qga0 -device virtio-serial -device virtserialport,chardev=vm0_qga0,name=org.qemu.guest_agent.0 -vnc :1 -drive file=/home/images/vm0.img

Further Note:

According to our observation this issue is intermittent with a medium probability. At the very beginning when I hit this issue I still managed to boot it up by removing mgmt. interface from the qemu cli  “-device e1000,netdev=nttsip1  -netdev user,id=nttsip1,hostfwd=tcp:10.239.182.115:6000-:22”, for the latest instance of the failure I am not able to get it boot up any longer even I simplified the command line to “qemu-system-x86_64 -name vm0 -enable-kvm -daemonize -cpu host -smp 4 -m 10240 -vnc :10 -drive file=/home/images/vm0.img”.
Comment 1 fei wang 2024-02-21 09:47:14 UTC
Created attachment 872891 [details]
screenshot for the failure symptom
Comment 2 Dario Faggioli 2024-02-21 11:17:49 UTC
What version of the qemu packag do you have there ?

Can you try with '-cpu host,host-phys-bits=on' ?
Comment 3 Lin Ma 2024-02-21 14:15:19 UTC
(In reply to fei wang from comment #0)
> ...... for the latest
> instance of the failure I am not able to get it boot up any longer even I
> simplified the command line to “qemu-system-x86_64 -name vm0 -enable-kvm
> -daemonize -cpu host -smp 4 -m 10240 -vnc :10 -drive
> file=/home/images/vm0.img”.

* If the image /home/images/vm0.img is a sparse based file, Please make sure there is enough free disk space for path /home/images/ on the host.

* I suggest explicitly specifying image format instead of auto probing by qemu. e.g.:
   get the image format information through "qemu-img info /home/images/vm0.img", then add the format information to qemu cli: "-drive file=/home/images/vm0.img,format={raw,qcow2}"

* Please make sure there is enough free virtual disk space in image vm0.img for various mount points.

* In general, The message "start job is running for wicked managed network interfaces" is normal, If this message stays longer, usually it means:
according to the configuration, wicked is waiting for an IP for the nic.

I have no idea yet why the guest os stucks there for overnight, But it seems there is at least one nic configuration in wicked and wicked is waiting for an IP according to that configuration.
Even though the simplified qemu cli you used doesn't contain a virtual nic, But qemu offers a default nic(e1000) for you because you doesn't explicitly specify the '-nodefaults' option, You can see it by perform 'info network' command via  your qemu monitor interface.
So I suggest:
1. Add '-nodefaults' option to your simplified qemu cli to start the vm.
2. If the guest os successfully start up, then remove the exist wicked nic configuration.
3. shutoff the vm.
4. Start the vm using your regular qemu cli.
5. If the guest os successfully start up, then re-config the nic(s) in wicked.
6. Observe.

BTW, you used to use vfio to passthrough device to the vm, In this case, host fully allocates all of memory(10G) for qemu instance instead of COW, So please make sure there is enough free memory on the host to avoid OOM.
Comment 4 fei wang 2024-02-22 09:37:55 UTC
(In reply to Dario Faggioli from comment #2)
> What version of the qemu packag do you have there ?
> 
> Can you try with '-cpu host,host-phys-bits=on' ?

Hey Dario,
i tried adding '-cpu host,host-phys-bits=on' but neither saw improvement nor the increased log verbosity. May i know what is the function of this parameter?
Comment 5 fei wang 2024-02-22 09:49:16 UTC
(In reply to Lin Ma from comment #3)
> (In reply to fei wang from comment #0)
> > ...... for the latest
> > instance of the failure I am not able to get it boot up any longer even I
> > simplified the command line to “qemu-system-x86_64 -name vm0 -enable-kvm
> > -daemonize -cpu host -smp 4 -m 10240 -vnc :10 -drive
> > file=/home/images/vm0.img”.
> 
> * If the image /home/images/vm0.img is a sparse based file, Please make sure
> there is enough free disk space for path /home/images/ on the host.
> 
> * I suggest explicitly specifying image format instead of auto probing by
> qemu. e.g.:
>    get the image format information through "qemu-img info
> /home/images/vm0.img", then add the format information to qemu cli: "-drive
> file=/home/images/vm0.img,format={raw,qcow2}"
> 
> * Please make sure there is enough free virtual disk space in image vm0.img
> for various mount points.
> 
> * In general, The message "start job is running for wicked managed network
> interfaces" is normal, If this message stays longer, usually it means:
> according to the configuration, wicked is waiting for an IP for the nic.
> 
> I have no idea yet why the guest os stucks there for overnight, But it seems
> there is at least one nic configuration in wicked and wicked is waiting for
> an IP according to that configuration.
> Even though the simplified qemu cli you used doesn't contain a virtual nic,
> But qemu offers a default nic(e1000) for you because you doesn't explicitly
> specify the '-nodefaults' option, You can see it by perform 'info network'
> command via  your qemu monitor interface.
> So I suggest:
> 1. Add '-nodefaults' option to your simplified qemu cli to start the vm.
> 2. If the guest os successfully start up, then remove the exist wicked nic
> configuration.
> 3. shutoff the vm.
> 4. Start the vm using your regular qemu cli.
> 5. If the guest os successfully start up, then re-config the nic(s) in
> wicked.
> 6. Observe.
> 
> BTW, you used to use vfio to passthrough device to the vm, In this case,
> host fully allocates all of memory(10G) for qemu instance instead of COW, So
> please make sure there is enough free memory on the host to avoid OOM.

WF: 
-The VM's indeed using qcow2 sparse file system, but i am pretty sure from df-h perspective there are still plenty of available storage space out there on both of our two systems on which we are observing the same failure symptom.

-Miraculously after booting the VM for multiple times i got the luck to boot it back. i will try -nodefaults option if i hit consistent consecutive failure.

-The troublesome thing is we are using DPDK Test Suite which is an automation tool to do this test, so the command line parameter is inherited from there, we don't have much flexibility to customize the qemu cli parameters, i understand there is indeed some way there, but would need a certain amount efforts, what's worse is that it means we have to implement SLES-specific CLI.

-both of our systems have 256GB memory, i suppose 10GB memory would not be a problem for them.
Comment 6 Lin Ma 2024-02-22 12:56:34 UTC
The issue about stucking at "start job is running for wicked managed network interfaces", It seems to ask wicked team for further help.
@Claudio, Any thoughts?

Below is a workaround to avoid this issue, you might have a try:

Launch your SLES 15 SP6 vm image, take a look at two files, e.g.:
(I assume that you're using wicked to manage network in the guest os and there is only one nic configured by the wicked)
guest:~ # cat /etc/sysconfig/network/ifcfg-eth0
BOOTPROTO='dhcp4'
STARTMODE='auto'

guest:~ # cat /etc/udev/rules.d/70-persistent-net.rules
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="virtio-pci", ATTR{dev_id}=="0x0", KERNELS=="0000:00:03.0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

You can see that A nic whose bus is 0 and addr is 0x3 will be assigned an interface name "eth0", and there is a wicked network configuration for interface eth0(ifcfg-eth0).

So according to above example, what you need to do is to ensure there is a virtual nic's bus is pci.0 and addr is 0x3 when you run DPDK Test Suite. E.g:
......
-device e1000,netdev=nttsip1,bus=pci.0,addr=0x3
......

If it's hard to customize the qemu cli parameter for you, You can choose one of the three:
A. Dig into DPDK Test Suite to see if it allows users to specify bus number and addr for pci devices.
B. Re-generate/re-customize your vm image to cleanup network configuration
C. Use sles15sp6 jeos image.
Comment 7 fei wang 2024-02-23 10:03:35 UTC
currently i worked around this issue by switching from wicked to NetworkManager service, so far so good, not sure if there is obvious drawback/caveat/pitfall for going with NetworkManager.
Comment 8 Marc Ruehrschneck 2024-03-07 14:17:12 UTC
Lin, is switching to networkmanager a valid option? Is this fully supported?
Comment 9 Lin Ma 2024-03-07 16:49:04 UTC
(In reply to Marc Ruehrschneck from comment #8)
> Lin, is switching to networkmanager a valid option? Is this fully supported?

Yes, it is a valid option.
I think the networkmanager is fully official supported, although I'm not 100% sure. Need pm or networkmanager team helping to confirm it.
Comment 10 Claudio Fontana 2024-03-08 14:46:43 UTC
As per question in comment #6, assigned to wicked maintainers to answer the general question about the wicked symptom and eventually comment #8.
Comment 11 Radoslav Tzvetkov 2024-03-18 10:19:12 UTC
Hi we are at the final stage of SLE15 SP6. When shall we expect this?
Comment 12 Antoine Ginies 2024-04-10 10:59:11 UTC
Based on comment this is not a Virtu bug, its more a wicked change behavior/ bug
Comment 13 Clemens Famulla-Conrad 2024-04-19 09:30:42 UTC
It could be the the same issue as mentioned in bsc#1222105, which is fixed in the latest version. Is a re validation possible?

Thanks in advance.
Comment 14 Marius Tomaschewski 2024-04-22 08:58:12 UTC
(In reply to Clemens Famulla-Conrad from comment #13)
> It could be the the same issue as mentioned in bsc#1222105, which is fixed
> in the latest version. Is a re validation possible?

It'd be relevant if there would be also a bridge (with eth interface as port)
and enabled STP and the nic port is unable to find carrier (which the bridge
is inheriting).

As you're using Intel E810 VFs, it's could be a variant of this Intel E810
nic reset & ethtool reading issue:
  https://bugzilla.suse.com/show_bug.cgi?id=1215269
Please make sure, you kernels and DPDK drivers include this bug fix.
Comment 15 fei wang 2024-04-22 09:02:58 UTC
tried accessing https://bugzilla.suse.com/show_bug.cgi?id=1222105 and also https://bugzilla.suse.com/show_bug.cgi?id=1215269, unfortunately i got "You are not authorized to access bug #" error, do i need to apply for additional permission? Thanks.
Comment 16 Marc Ruehrschneck 2024-04-22 09:40:26 UTC
Fei, I added you to https://bugzilla.suse.com/show_bug.cgi?id=1215269
Comment 17 fei wang 2024-04-22 10:26:21 UTC
i went through https://bugzilla.suse.com/show_bug.cgi?id=1215269, it was raised by my colleague. Actually we haven't got any fix plan for our driver team for that issue yet. Also i tend to believe these are two issues, though i am not sure.

Would it be possible to add me in the CC list for https://bugzilla.suse.com/show_bug.cgi?id=1222105? i'd like to check what is the symptom and solution mentioned there, and will give it a try if possible.Thx.
Comment 18 Marius Tomaschewski 2024-04-22 13:28:00 UTC
Fei, you write, "this issue is intermittent with a medium probability"
and "elapsed time 28s doesn’t change and bootup process cannot proceed".

You're not reinstalling the VM, just starting + stopping it, right?
As the "elapsed time 28s doesn’t change" is comming from systemd, it
sounds like that the complete VM / kernel freezes.
Is kdump enabled and perhaps there is a kernel dump in /var/crash/…?
It needs quite a while until it gets written.

Could you attach a supportconfig from the same, but working case?
Please also the `journalctl -o short-precise -b > journal.0.log`
and when it happened in the previous boot, please try also:
`journalctl -o short-precise -b 1 > journal.0.log`

This would give us some hints about the config+environment in the VM.

When possible, please enable debug log (once, before reboot),
that is 
  WICKED_DEBUG=all
  WICKED_LOG_LEVEL=debug2
as described at https://en.opensuse.org/openSUSE:Bugreport_wicked:
...
 # enable debugging, applied to wickedd*.service as well as to wicked.service aka network.service
 #      (when requested in a bug report to enable debug level 2, use '{$1=debug2}' bellow)
 perl -i -lpe 's{^(WICKED_DEBUG)=.*}{$1=all};s{^(WICKED_LOG_LEVEL)=.*}{$1=debug}' /etc/sysconfig/network/config
...
Comment 19 fei wang 2024-04-23 10:38:17 UTC
let me try to reproduce the failure and then collect more logs and get back to you, thanks.