Bugzilla – Bug 1219986
virsh iface-list hangs when initialization of libnetcontrol/wicked fails
Last modified: 2024-05-10 08:01:08 UTC
Created attachment 872787 [details] compressed log output of virtinterfaced Running ~> virsh iface-list as normal user hangs forever, while other commands like `virsh list` work. This is with the non-monolithic setup of libvirt, i.e. with `virtinterfaced`. The output of `virtinterfaced` started manually with `-v` is attached.
Also reproducible with libvirt 10.0 from the Virtualization repo on OBS.
I believe the problem is that the initialization of libnetcontrol fails in `src/interface/interface_backend_netcf.c` since I am not root. According to the code the function still returns 0 (success): int netcfIfaceRegister(void) { struct netcf *netcf; /* Initialization of libnetcontrol will fail if NetworkManager is enabled. * Skip registration if ncf_init fails. * TODO: finer-grained check? E.g. is_nm_enabled() */ if (ncf_init(&netcf, NULL) != 0) { VIR_WARN("Failed to initialize libnetcontrol. Management of interface devices is disabled"); return 0; } This prevents the fallback to udev in `src/interface/interface_driver.c`: int interfaceRegister(void) { #ifdef WITH_NETCF /* Attempt to load the netcf based backend first */ if (netcfIfaceRegister() == 0) return 0; #endif /* WITH_NETCF */ #ifdef WITH_NETCONTROL /* Attempt to load the netcontrol based backend, which is a slightly patched netcf backend */ if (netcfIfaceRegister() == 0) return 0; #endif /* WITH_NETCONTROL */ #if WITH_UDEV /* If there's no netcf or netcontrol, or it failed to load, register the udev backend */ if (udevIfaceRegister() == 0) return 0; #endif /* WITH_UDEV */ return -1; } And thus virtinterfaced ends up without an interface driver, but the `interfaceRegister` succeeded.
fix is here: https://github.com/openSUSE/libvirt/pull/1
(In reply to Tiziano Müller from comment #3) > fix is here: https://github.com/openSUSE/libvirt/pull/1 Thanks for the patch, and sorry for not enabling merge requests on openSUSE/libvirt project. Maybe something for the future. In the meantime we can handle the issue here. if (ncf_init(&netcf, NULL) != 0) { VIR_WARN("Failed to initialize libnetcontrol. Management of interface devices is disabled"); - return 0; + return -1; } With this change, virtinterfaced doesn't even start, right? When using the monolithic libvirtd, I'm pretty sure it wouldn't start with this change.
I've rolled my patch out to 3 machines (2 with libvirt-9, 1 with libvirt-10). On all of them does virtinterfaced start perfectly (as unprivileged user): tmueller@infra1:~> pgrep virtinterfaced tmueller@infra1:~> virsh iface-list Name State MAC Address ------------------------------------ eth0 active b8:3f:d2:a1:c8:86 hsn0 active 02:00:00:00:00:00 tmueller@infra1:~> pgrep virtinterfaced 8194 The key is in #c2: If netcfIfaceRegister returns with 0, interfaceRegister returns with 0, but netcontrol is not initialised and unvailable. In fact, no backend is available, but libvirt assumes at least one has been successfully initialised because interfaceRegister did not return with a -1. Hence the call to iface-list hangs. If netcfIfaceRegister returns a -1, there is a fallback to udevIfaceRegister, which actually works. If it does not, then virtinterfaced would not start. libvirtd.service starts as well (as root). I still have to figure out how to use the monolithic daemon with qemu:///session as `virsh iface-list` as normal user automatically starts virtinterfaced.
(In reply to Tiziano Müller from comment #5) > The key is in #c2: If netcfIfaceRegister returns with 0, interfaceRegister > returns with 0, but netcontrol is not initialised and unvailable. In fact, > no backend is available, but libvirt assumes at least one has been > successfully initialised because interfaceRegister did not return with a -1. > Hence the call to iface-list hangs. Nod. I should have read that comment closer :-). > If netcfIfaceRegister returns a -1, there is a fallback to > udevIfaceRegister, which actually works. If it does not, then virtinterfaced > would not start. Heh, yeah, I forgot about the udev interface backend. It, and the interface driver as a whole, do not get any love upstream. In fact, the idea of deprecating it has been floated https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/REKSP5LHZHC7TQ3FRDNUKRVQHJOD7GKL/#MWEFFP6W5IK4XPBKTGUB2TKTUGW3ELAK Do you use the virInterface* APIs (aka, 'virsh iface-*')? AFAICT, the udev backend only supports read operations https://gitlab.com/libvirt/libvirt/-/blob/master/src/interface/interface_backend_udev.c?ref_type=heads#L1241
(In reply to James Fehlig from comment #6) > Heh, yeah, I forgot about the udev interface backend. It, and the interface > driver as a whole, do not get any love upstream. In fact, the idea of > deprecating it has been floated > > https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/ > REKSP5LHZHC7TQ3FRDNUKRVQHJOD7GKL/#MWEFFP6W5IK4XPBKTGUB2TKTUGW3ELAK Makes sense, also netcf does not seem to get much attention anymore. In fact, I am pretty sure that error code path for netconfig (in the opensuse fork of libvirt) has never been tested, otherwise that hang would have been found much earlier as it should also occur as root if netconfig could not initialise. > > Do you use the virInterface* APIs (aka, 'virsh iface-*')? AFAICT, the udev > backend only supports read operations > > https://gitlab.com/libvirt/libvirt/-/blob/master/src/interface/ > interface_backend_udev.c?ref_type=heads#L1241 I only testd with `virsh iface-list` and `virsh iface-dumpxml`, to be honest. My use case is vagrant-libvirt, which does an enumeration of the devices (via the fog libvirt ruby library), and kept hanging when using qemu:///session for the same reason `virsh iface-list` was hanging. So, read-only is good enough. With the proposed patch (and some minor modifications to vagrant-libvirt) I was finally able to get a VM up and running via qemu:///session.
I think you are the first outside contributor of the libvirt package in all the years I've maintained it! That warrants a description of the workflow :-). The libvirt package is maintained at gitlab.suse.de/virtualization/libvirt, which requires VPN access, so no outside contributors there. This allows processing of embargoed CVEs, so packages are ready on day 0. It is maintained as a set of branches, based on a release tag, with downstream patches and cherry picked upstream fixes on top. Each branch corresponds to a maintained SUSE code base, e.g. v9.0.0-sle15sp5, v10.0.0-sle15sp6, etc. The factory branch is also maintained there, but unlike the others it's mirrored to github.com/openSUSE/libvirt. FYI, the qemu package follows a similar workflow, but allows merge requests on the github repo since there's been a history of occasional outside contribution. And final history note: your patch touches a downstream patch we've been carrying around since the SLE12 GA days (libvirt 1.2.5): https://github.com/openSUSE/libvirt/commit/b219ffadc52f30a6fa5898837a5aa7f1942222ef The udev backend was added after the downstream patch was created, and IMO we didn't properly adjust it when forward porting.
I made a small adjustment to your patch and committed it to the factory branch https://github.com/openSUSE/libvirt/commit/265837a78a5de4b19cd75c56b5119324f62474d4 leaving your SOB since you provided the actual fix. This will take care of Factory/Tumbleweed and SLE15 SP6/Leap 15.6. I see you opened the bug against Leap 15.5. Do you need the fix there as well? If so, it will require a maintenance update of the SLE15 SP5 libvirt package, which will also flow to Leap 15.5.
Final note of the day: If libnetcontrol initialization fails, 'virsh iface-list' also hangs when run as root using the qemu:///system URI.
This is an autogenerated message for OBS integration: This bug (1219986) was mentioned in https://build.opensuse.org/request/show/1148936 Factory / libvirt
(In reply to James Fehlig from comment #9) > I made a small adjustment to your patch and committed it to the factory > branch > > https://github.com/openSUSE/libvirt/commit/ > 265837a78a5de4b19cd75c56b5119324f62474d4 > > leaving your SOB since you provided the actual fix. This will take care of > Factory/Tumbleweed and SLE15 SP6/Leap 15.6. I see you opened the bug against > Leap 15.5. Do you need the fix there as well? If so, it will require a > maintenance update of the SLE15 SP5 libvirt package, which will also flow to > Leap 15.5. Awesome, thanks a lot! For 15.5: not necessarily. I created a bsdiff/patch for our systems where we need the functionality, but unless an update of libvirt is being pushed before Leap 15.6 we don't need an updated libvirt package on 15.5.
(In reply to Tiziano Müller from comment #13) > For 15.5: not necessarily. I created a bsdiff/patch for our systems where we > need the functionality, but unless an update of libvirt is being pushed > before Leap 15.6 we don't need an updated libvirt package on 15.5. I suspect there will be a libvirt update for 15.5 before 15.6 is released. Even though I agree with the change, I'd like to avoid it in SLE15 SP5 (which feeds Leap 15.5), where the monolithic daemon is still the default.
This is an autogenerated message for OBS integration: This bug (1219986) was mentioned in https://build.opensuse.org/request/show/1149655 Factory / libvirt
I don't think there's anything left to do here. Closing the bug now. Please reopen if I'm forgetting something :-).