Bug 1218376 - conntrackd buffer size insufficient?
Summary: conntrackd buffer size insufficient?
Status: NEW
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Network (show other bugs)
Version: Leap 15.5
Hardware: x86-64 openSUSE Leap 15.5
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Adam Majer
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-24 05:07 UTC by Georg Pfuetzenreuter
Modified: 2024-01-03 15:00 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
journal from the 24th (160.28 KB, text/plain)
2023-12-27 12:12 UTC, Georg Pfuetzenreuter
Details
asgard2 domain XML (7.08 KB, application/xml)
2024-01-03 14:14 UTC, Georg Pfuetzenreuter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Georg Pfuetzenreuter 2023-12-24 05:07:45 UTC
Hi,

we run a pair of openSUSE Leap machines in our openSUSE infrastructure acting as firewalls and utilize conntrackd on them.
I noticed that after rebooting one node, its initial resync is aborted by systemd:

```
systemd[1]: conntrackd.service: Watchdog timeout (limit 1min)!
```

I tried increasing the `WatchdogSec` parameter in a systemd override, but it seems even 240 seconds do not help:

```
Dec 24 04:46:54 asgard2 systemd[1]: Started Connection tracking userspace daemon.
Dec 24 04:46:54 asgard2 conntrack-tools[3196]: using user-space event filtering
Dec 24 04:46:54 asgard2 conntrack-tools[3196]: netlink event socket buffer size has been set to 262142 bytes
Dec 24 04:46:54 asgard2 conntrack-tools[3196]: initialization completed
Dec 24 04:46:54 asgard2 conntrack-tools[3196]: -- starting in console mode --
Dec 24 04:46:54 asgard2 conntrack-tools[3196]: resync requested
Dec 24 04:46:54 asgard2 conntrackd[3196]: [Sun Dec 24 04:46:54 2023] (pid=3196) [notice] netlink event socket buffer size has been doubled to 524284 bytes
Dec 24 04:46:54 asgard2 conntrack-tools[3196]: netlink event socket buffer size has been doubled to 524284 bytes
Dec 24 04:46:54 asgard2 conntrackd[3196]: [Sun Dec 24 04:46:54 2023] (pid=3196) [warning] netlink event socket buffer size cannot be doubled further since it will exceed NetlinkBufferSizeMaxGrowth. We are likely to be losing events, this may lead to unsynchronized replicas. Please, consider increasing netlink socket buffer size via NetlinkBufferSize and NetlinkBufferSizeMaxGrowth clauses in conntrackd.conf
Dec 24 04:46:54 asgard2 conntrack-tools[3196]: netlink event socket buffer size cannot be doubled further since it will exceed NetlinkBufferSizeMaxGrowth. We are likely to be losing events, this may lead to unsynchronized replicas. Please, consider increasing netlink socket buffer size via NetlinkBufferSize and NetlinkBufferSizeMaxGrowth clauses in conntrackd.conf
Dec 24 04:46:41 asgard2 systemd[1]: conntrackd.service: Watchdog timeout (limit 4min)!
Dec 24 04:46:41 asgard2 systemd[1]: conntrackd.service: Killing process 3196 (conntrackd) with signal SIGABRT.
Dec 24 04:46:41 asgard2 systemd[1]: conntrackd.service: Main process exited, code=dumped, status=6/ABRT
Dec 24 04:46:41 asgard2 systemd[1]: conntrackd.service: Failed with result 'watchdog'.
Dec 24 04:46:41 asgard2 systemd[1]: conntrackd.service: Scheduled restart job, restart counter is at 1.
Dec 24 04:46:41 asgard2 systemd[1]: Stopped Connection tracking userspace daemon.
Dec 24 04:46:41 asgard2 systemd[1]: Starting Connection tracking userspace daemon...
```

I'm unsure whether submitting a patch to conntrackd.service (in the conntrack-tools package) increasing `WatchdogSec` a very high number is a good idea, as it seems to only be needed for the initial resync and not during regular operation, and I could not find a systemd setting to set a dedicated timeout for the unit startup.
After the service automatically restarts, it does not time out a second time.

Is there something that could be tuned here to avoid the initial startup failure?
Comment 1 Adam Majer 2023-12-27 09:34:49 UTC
Have you tried followed the recommended setting changes as per the log messages? What values have you set?
Comment 2 Georg Pfuetzenreuter 2023-12-27 11:55:34 UTC
Hi Adam,

I currently use:

```
	NetlinkBufferSize 10064960
	NetlinkBufferSizeMaxGrowth 20129920
```

This is about ten times more than what was set originally and the issue persists.
Comment 3 Jan Engelhardt 2023-12-27 11:56:52 UTC
>Dec 24 04:46:54 asgard2 systemd[1]: Started Connection tracking userspace daemon.
>Dec 24 04:46:41 asgard2 systemd[1]: conntrackd.service: Watchdog timeout (limit 4min)!

Sorry, we don't support universes where time goes backwards.
Comment 4 Adam Majer 2023-12-27 12:06:00 UTC
(In reply to Jan Engelhardt from comment #3)
> >Dec 24 04:46:54 asgard2 systemd[1]: Started Connection tracking userspace daemon.
> >Dec 24 04:46:41 asgard2 systemd[1]: conntrackd.service: Watchdog timeout (limit 4min)!
> 
> Sorry, we don't support universes where time goes backwards.

Nice catch. I wonder if this is part of the problem. Could the RTC be wrong on bootup causing this issue? Do you see cases where this happens when the clock is correct?
Comment 5 Georg Pfuetzenreuter 2023-12-27 12:06:55 UTC
System clock got corrected during boot, in between the start of conntrackd and the watchdog kill:

```
Dec 24 04:46:59 asgard2 chronyd[3256]: Selected source 2a07:de40:b27e:1203::12 (ntp2.infra.opensuse.org)
Dec 24 04:46:59 asgard2 chronyd[3256]: System clock wrong by -256.858925 seconds
Dec 24 04:42:42 asgard2 chronyd[3256]: System clock was stepped by -256.858925 seconds
```
Comment 6 Georg Pfuetzenreuter 2023-12-27 12:12:22 UTC
Created attachment 871571 [details]
journal from the 24th
Comment 7 Adam Majer 2023-12-27 12:16:23 UTC
Yes, I understand, but does the timeout behaviour also happen in other instances, where the clock is not wrong?

If yes (still see timeouts), could you update to current version 1.4.8 to check if the problem persists? That would be in security:netfilter/conntrack-tools
Comment 8 Georg Pfuetzenreuter 2023-12-27 12:20:03 UTC
> Yes, I understand, but does the timeout behaviour also happen in other instances, where the clock is not wrong?

Will try on the next boot and let you know.

> If yes (still see timeouts), could you update to current version 1.4.8 to check if the problem persists?

I can, if it is not a problem to run different versions on both nodes temporarily?
Comment 9 Adam Majer 2023-12-27 12:23:06 UTC
(In reply to Georg Pfuetzenreuter from comment #8)
> > If yes (still see timeouts), could you update to current version 1.4.8 to check if the problem persists?
> 
> I can, if it is not a problem to run different versions on both nodes
> temporarily?

These versions should be compatible.
Comment 10 Georg Pfuetzenreuter 2024-01-03 11:07:51 UTC
It seems the clock drift was caused by an unsynchronized hypervisor clock. I corrected it now, so the VM in question no longer performs clock stepping during boot.

After this, I can no longer observe any watchdog timeouts, conntrackd starts on the first attempt:

```
# journalctl -S today -u conntrackd
Jan 03 10:53:21 asgard2 systemd[1]: Starting Connection tracking userspace daemon...
Jan 03 10:53:21 asgard2 conntrackd[3109]: [Wed Jan  3 10:53:21 2024] (pid=3109) [notice] disabling external cache
Jan 03 10:53:21 asgard2 conntrack-tools[3109]: disabling external cache
Jan 03 10:53:21 asgard2 conntrackd[3109]: [Wed Jan  3 10:53:21 2024] (pid=3109) [notice] using user-space event filtering
Jan 03 10:53:21 asgard2 conntrackd[3109]: [Wed Jan  3 10:53:21 2024] (pid=3109) [notice] netlink event socket buffer size has been set to 262142 bytes
Jan 03 10:53:21 asgard2 conntrackd[3109]: [Wed Jan  3 10:53:21 2024] (pid=3109) [notice] initialization completed
Jan 03 10:53:21 asgard2 conntrackd[3109]: [Wed Jan  3 10:53:21 2024] (pid=3109) [notice] -- starting in console mode --
Jan 03 10:53:21 asgard2 conntrackd[3109]: [Wed Jan  3 10:53:21 2024] (pid=3109) [notice] resync requested
Jan 03 10:53:21 asgard2 conntrack-tools[3109]: using user-space event filtering
Jan 03 10:53:21 asgard2 conntrack-tools[3109]: netlink event socket buffer size has been set to 262142 bytes
Jan 03 10:53:21 asgard2 conntrack-tools[3109]: initialization completed
Jan 03 10:53:21 asgard2 conntrack-tools[3109]: -- starting in console mode --
Jan 03 10:53:21 asgard2 conntrack-tools[3109]: resync requested
Jan 03 10:53:21 asgard2 systemd[1]: Started Connection tracking userspace daemon.
Jan 03 10:53:21 asgard2 conntrackd[3109]: [Wed Jan  3 10:53:21 2024] (pid=3109) [notice] netlink event socket buffer size has been doubled to 524284 bytes
Jan 03 10:53:21 asgard2 conntrack-tools[3109]: netlink event socket buffer size has been doubled to 524284 bytes
Jan 03 10:53:21 asgard2 conntrackd[3109]: [Wed Jan  3 10:53:21 2024] (pid=3109) [warning] netlink event socket buffer size cannot be doubled further since it will exceed NetlinkBufferSizeMaxGrowth. We are likely to be losing events, this may lead to unsynchronized replicas. Please, consider increasing netlink socket buffer size via NetlinkBufferSize and NetlinkBufferSizeMaxGrowth clauses in conntrackd.conf
Jan 03 10:53:21 asgard2 conntrack-tools[3109]: netlink event socket buffer size cannot be doubled further since it will exceed NetlinkBufferSizeMaxGrowth. We are likely to be losing events, this may lead to unsynchronized replicas. Please, consider increasing netlink socket buffer size via NetlinkBufferSize and NetlinkBufferSizeMaxGrowth clauses in conntrackd.conf
```

Thank you very much for pointing me in the right direction.

I am still concerned about the NetlinkBufferSize warnings, but would say they're out of scope here.
Comment 11 Adam Majer 2024-01-03 11:31:56 UTC
(In reply to Georg Pfuetzenreuter from comment #2)
> Hi Adam,
> 
> I currently use:
> 
> ```
> 	NetlinkBufferSize 10064960
> 	NetlinkBufferSizeMaxGrowth 20129920
> ```
> 
> This is about ten times more than what was set originally and the issue
> persists.

	NetlinkBufferSize 10064960
	NetlinkBufferSizeMaxGrowth 20129920

Add a zero to the MaxGrowth one and see what happens. The warning you are seeing is generated when the BufferSize * 2 > BufferSizeMaxGrowth which happens rather quickly. And sadly this warning is displayed only once even if condition persists.

How many connections is it trying to sync?
Comment 12 Georg Pfuetzenreuter 2024-01-03 11:48:17 UTC
Added a 0, unfortunately no change:

```
        NetlinkBufferSize 10064960
        NetlinkBufferSizeMaxGrowth 201299200

Jan 03 11:41:07 asgard2 conntrackd[3094]: [Wed Jan  3 11:41:07 2024] (pid=3094) [notice] netlink event socket buffer size has been doubled to 524284 bytes
Jan 03 11:41:07 asgard2 conntrackd[3094]: [Wed Jan  3 11:41:07 2024] (pid=3094) [warning] netlink event socket buffer size cannot be doubled further since it will exceed NetlinkBufferSizeMaxGrowth. We are likely to be losing events, this may lead to unsynchronized replicas. Please, consider increasing netlink socket buffer size via NetlinkBufferSize and NetlinkBufferSizeMaxGrowth clauses in conntrackd.conf
```

Below are the statistics, including connection counts.

Primary:

```
asgard1 # conntrackd -i |wc -l
57521
asgard1 # wc -l /proc/net/nf_conntrack
28397 /proc/net/nf_conntrack
asgard1 (Firewall, Router):~ # conntrackd -s
cache internal:
current active connections:            61311
connections created:               358021997    failed:     14217661
connections updated:              2073835016    failed:            0
connections destroyed:             357960686    failed:            0

external inject:
connections created:                65759864    failed:            0
connections updated:                47787341    failed:            1
connections destroyed:                829393    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic (active device=os-asgard):
        557760735600 Bytes sent          16236662192 Bytes recv
          1741749095 Pckts sent             30735608 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs            260390609 Lost msgs
```

Secondary (that's trying to sync):

```
asgard2 # conntrackd -i |wc -l
48
asgard2 # wc -l /proc/net/nf_conntrack
28976 /proc/net/nf_conntrack
asgard2 # conntrackd -s
cache internal:
current active connections:               85
connections created:                      87    failed:            0
connections updated:                      95    failed:            0
connections destroyed:                     2    failed:            0

external inject:
connections created:                   90157    failed:            0
connections updated:                  686213    failed:            0
connections destroyed:                 10934    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic (active device=os-asgard):
              140196 Bytes sent             72290060 Bytes recv
                1894 Pckts sent               431211 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs              1648886 Lost msgs
```
Comment 13 Adam Majer 2024-01-03 12:12:12 UTC
It's a little strange. It's not that many connections.

You've said this is running in a VM. What is the load on the host and in the VM? Would it be possible for this VM to have higher scheduling priority? Maybe pin it to some specific core(s)?

If CPU load is not an issue, it could be interesting to see what's the cause here.

Can you attach config file for the VM or the libvirt process with parameters how it's started?
Comment 14 Georg Pfuetzenreuter 2024-01-03 14:12:11 UTC
Thanks for updating the bug for this.

For load inside the VM, maybe the dashboard is helpful:
https://monitor.opensuse.org/grafana/d/rYdddlPWk/node-metrics?orgId=1&var-DS_PROMETHEUS=default&var-job=nodes&var-node=asgard2.infra.opensuse.org

The hypervisor is not visualized there, but it is mostly idling. This VM is the only one running on it and there are only the virtualization relevant services running on the host system.

Here is some output from said hypervisor:

```
orbit21:~ # ps -eo pcpu,args|grep qemu
29.3 /usr/bin/qemu-system-x86_64 -name guest=asgard2.infra.opensuse.org,process=qemu:asgard2.infra.opensuse.org,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-asgard2.infra.opensu/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/qemu/ovmf-x86_64-opensuse-code.bin","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/data0/kvm/nvram/asgard2.infra.opensuse.org.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine pc-q35-7.1,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format -accel kvm -cpu EPYC-Rome,ht=on,monitor=on,xsaves=on,cmp-legacy=on,extapic=on,ibs=on,skinit=on,wdt=on,tce=on,topoext=on,perfctr-nb=on,invtsc=on,ibrs=on,amd-ssbd=on,lbrv=on,svm-lock=on,tsc-scale=on,vmcb-clean=on,flushbyasid=on,decodeassists=on,pause-filter=on,pfthreshold=on,avic=on,v-vmsave-vmload=on,vgif=on -global driver=cfi.pflash01,property=secure,value=on -m 5860 -object {"qom-type":"memory-backend-ram","id":"pc.ram","size":6144655360} -overcommit mem-lock=off -smp 4,maxcpus=6,sockets=6,cores=1,threads=1 -uuid 4753886d-6569-4b40-b6b7-9818f5a66acd -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device {"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"} -device {"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"} -device {"driver":"pcie-root-port","port":10,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x1.0x2"} -device {"driver":"pcie-root-port","port":11,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x1.0x3"} -device {"driver":"pcie-root-port","port":12,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x1.0x4"} -device {"driver":"pcie-root-port","port":13,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x1.0x5"} -device {"driver":"pcie-root-port","port":14,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x1.0x6"} -device {"driver":"qemu-xhci","id":"usb","bus":"pci.2","addr":"0x0"} -device {"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"} -blockdev {"driver":"file","filename":"/data0/kvm/disks/asgard2.infra.opensuse.org_root.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null} -device {"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk1","bootindex":1,"write-cache":"on"} -netdev tap,fd=27,vhost=on,vhostfd=29,id=hostnet0 -device {"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"d2:e1:4b:98:46:2f","bus":"pci.1","addr":"0x0"} -chardev pty,id=charserial0 -device {"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0} -chardev socket,id=charchannel0,fd=25,server=on,wait=off -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"} -audiodev {"id":"audio1","driver":"none"} -device {"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"} -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device {"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

orbit21:~ # free -h
               total        used        free      shared  buff/cache   available
Mem:           503Gi       5.3Gi       500Gi        16Mi       278Mi       498Gi
Swap:           31Gi          0B        31Gi

orbit21:~ # virsh nodeinfo
CPU model:           x86_64
CPU(s):              32
CPU frequency:       1500 MHz
CPU socket(s):       1
Core(s) per socket:  16
Thread(s) per core:  1
NUMA cell(s):        2
Memory size:         528090836 KiB

orbit21:~ # virsh vcpuinfo asgard2.infra.opensuse.org
VCPU:           0
CPU:            16
State:          running
CPU time:       159.3s
CPU Affinity:   yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

VCPU:           1
CPU:            16
State:          running
CPU time:       1177.8s
CPU Affinity:   yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

VCPU:           2
CPU:            17
State:          running
CPU time:       1760.1s
CPU Affinity:   yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

VCPU:           3
CPU:            19
State:          running
CPU time:       54.1s
CPU Affinity:   yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

orbit21:~ # virsh dommemstat asgard2.infra.opensuse.org
actual 4000000
swap_in 0
swap_out 0
major_fault 84
minor_fault 92503
unused 5597588
available 5770056
usable 5499912
last_update 1704282057
disk_caches 58184
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 1825204
```

The template for the Libvirt domain XML file can be found here:
https://github.com/openSUSE/heroes-salt/blob/production/salt/files/libvirt/domains/common.xml.j2
I will also attach the live/dumpxml version shortly.

I'm happy to allocate more resources to the VM, or to adjust its configuration for scheduling/core pinning, if you still think it could help based on the above.
Comment 15 Georg Pfuetzenreuter 2024-01-03 14:14:57 UTC
Created attachment 871637 [details]
asgard2 domain XML
Comment 16 Adam Majer 2024-01-03 14:27:38 UTC
(In reply to Georg Pfuetzenreuter from comment #14)
> The hypervisor is not visualized there, but it is mostly idling. This VM is
> the only one running on it and there are only the virtualization relevant
> services running on the host system.

Thank you for the information. It seems my guess that it could be CPU was not correct.

> I'm happy to allocate more resources to the VM, or to adjust its
> configuration for scheduling/core pinning, if you still think it could help
> based on the above.

It's probably not related at all. I'll try to duplicate this locally.
Comment 17 Georg Pfuetzenreuter 2024-01-03 15:00:27 UTC
Appreciate the help, if you need any further information, let me know.