Bug 1226664

Summary: Wicked crashing with macvlan interface since recent `zypper patch` in systemd-nspawn container
Product: [openSUSE] openSUSE Distribution Reporter: Georg Jansing <georg.jansing>
Component: NetworkAssignee: wicked maintainers <wicked-maintainers>
Status: IN_PROGRESS --- QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P2 - High CC: Andreas.Stieger, cfamullaconrad, mt
Version: Leap 15.6   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Script to create a container that crashes wicked

Description Georg Jansing 2024-06-20 21:53:38 UTC
Created attachment 875621 [details]
Script to create a container that crashes wicked

I have a `systemd-nspawn` based container to separate a proprietary device from the rest of the house network.

This container is installed with a quite minimal openSUSE installation (initially with Leap 15.4, upgraded to 15.5).

Since a `zypper patch` this week the `wicked` service fails to start and `wicked ifup [...]` commands segfault.

Both upgrading to Leap 15.6 and reinstalling the container cleanly does not solve the issue.

I tracked the problem down to the container sharing its outbound network interface with the server via MACVLAN. After switching to IPVLAN or removing the interface, wicked is stable again.

I attached a script, that generates a very minimal container (can do both 15.6 and 15.5), that reproduces the error for me.

Trying to get a back trace (as in the last paragraph of https://en.opensuse.org/openSUSE:Bugreport_wicked) did not succeed. I did not find the path to the core dump.

I could attach a gdb to wicked and got it to crash (also documented in the script) with the message:
```
Program received signal SIGSEGV, Segmentation faultwarning: could not convert 'si_code' from the host encoding (ANSI_X3.4-1968) to UTF-32.
This normally should not happen, please file a bug report.
```
Then it complains about a lot of missing debuginfo packages, which I could not install. Maybe I need an additional repo?
Comment 1 Andreas Stieger 2024-06-21 01:57:53 UTC
(In reply to Georg Jansing from comment #0)
> Since a `zypper patch` this week 

Which one and when? Identify the update this relates to?
Are you reporting this against openSUSE-SLE-15.6-2024-1852 which updated to 0.6.75-150600.11.3.4?

Can you attempt to isolate it by downgrading wicked and related packages to 0.6.74-150600.9.2? 

zypper in --oldpackage ` \
zypper info -t patch --conflicts openSUSE-SLE-15.6-2024-1852 | \
grep " < " | while read NAME C VERSION; do \
rpm --quiet -q --queryformat "%{name}\n" $NAME && echo "${NAME}<${VERSION}"; \
done`
Comment 2 Georg Jansing 2024-06-21 06:16:41 UTC
Thanks for the downgrade command, that's a nice one :).
It actually downgraded only two packages: wicked and wicked-service.

I can confirm, that version
0.6.74-150600.9.2
does not crash.
Comment 3 Clemens Famulla-Conrad 2024-07-02 09:43:33 UTC
Thx for the report, it's an interesting usecase and I can reproduce the error.

To get a backtrace I did the following:

> $ZYPPER in gdb wicked-debugsource wicked-debuginfo systemd-coredump procps

and inside the container:

> gdb /usr/sbin/wicked
> (gdb) r --systemd ifup all
> (gdb) bt


Just to understand you correctly, what do you expect from `wicked ifup mv-eth0` (beside from not crashing). I didn't found a config in that container?
Comment 4 Georg Jansing 2024-07-02 12:04:59 UTC
> To get a backtrace I did the following: [...]

Ah, nice, good to know for the next time, thanks!

> Just to understand you correctly, what do you expect from `wicked ifup mv-eth0` (beside from
> not crashing). I didn't found a config in that container?

Oh, yes, correct. I tried that as a test in my real container, where there was an actual configuration for that interface. It seems `wicked ifup [...]`  crashes with any interface name, as long as there is a MACVLAN interface present in the container.
Comment 5 Clemens Famulla-Conrad 2024-07-02 12:58:17 UTC
Unfortunately yes. Wicked has here the problem that the ifindex for your lower device is given with 2 (but different namespace) and the macvlan device has the ifindex 2 as well. 
Currently wicked creates a dependency loop by these equal ifindex and doesn't reflect the different namespaces.

We are looking for a solution!

Fun fact: if you use a interface, which has a ifindex, which isn't known inside the container (e.g. a second ethernet with ifindex==3), you should not face the issue.
Comment 6 Marius Tomaschewski 2024-07-03 15:21:24 UTC
Increasing priority to P2 -- we've working on / testing a fix and will
submit updates ASAP to all code streams.

The SIGSEGV happens in the wicked client which is missing a loop guard
while trying to log the interface hierarchy/changes (as debug), visible
(as notice) in the new `wicked ifup --dry-run all`, but the trigger is
in the backend/wickedd as described by Clemens above.
Comment 7 Marius Tomaschewski 2024-07-09 08:23:23 UTC
FYI: The fix is reviewed in https://github.com/openSUSE/wicked/pull/1023
and in the Test RPMs (master + fix) at:

http://download.opensuse.org/repositories/network:/wicked:/testing/
Comment 8 OBSbugzilla Bot 2024-07-18 16:55:06 UTC
This is an autogenerated message for OBS integration:
This bug (1226664) was mentioned in
https://build.opensuse.org/request/show/1188448 Factory / wicked