Bugzilla – Bug 1216201
NFS Client is broken in MicroOS since 10.10
Last modified: 2023-10-16 22:36:21 UTC
Hello, It seems like the latest nfs-client update broke the nfs client in some cases. Context: I'm running multiple Kubernetes clusters on top of OpenSUSE MicroOS using the following Terraform module: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner I'm using Rancher Longhorn for volumes in the cluster and Longhorn's RWX volumes are using nfs under the hood. Yesterday, a transactional-update scheduled update upgraded all my nodes (including nfs-client package from 2.6.3-39.4 to 2.6.3-39.5) and the RWX volumes started to fail. The error message was the following in the journal: --- Oct 11 07:42:23 dev-worker-1 k3s[1294]: Mounting command: /usr/local/sbin/nsmounter Oct 11 07:42:23 dev-worker-1 k3s[1294]: Mounting arguments: mount -t nfs -o vers=4.1,noresvport,intr,hard 10.43.207.185:/pvc-13538170-4278-4467-b2b0-1f1ba6f54a4c /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/185c34f566c2eca6e8c7c6a2ede2094c076d7d25ddae286dc633eeef80551af0/globalmount Oct 11 07:42:23 dev-worker-1-autoscaled-small-19baf778f50efd8c k3s[1294]: Output: mount.nfs: Protocol not supported for 10.43.207.185:/pvc-13538170-4278-4467-b2b0-1f1ba6f54a4c on /var/lib/kubelet/plugin --- I also enabled debug logging in the kernel, you can find those logs in the related bug report. Anyway, after a few hours I decided to use MicroOS rollback going back to the latest snapshot from 2 days ago and the issue is solved now on that node. Related bug reports on Github: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/1016 https://github.com/longhorn/longhorn/issues/6857 Please let me know if I can provide you any additional information about this issue. Since my comment in https://bugzilla.opensuse.org/show_bug.cgi?id=1214540 multiple users reported the same issue. Regards, Janos Miko
There is no change to the nfs-client package between 2.6.3-39.4 and 2.6.3-39.5. I appears that the package was rebuilt, possibly because some package that it depended on changed. So it seems likely that some shared library that nfs uses was also updated. Do you have a list of other packages that were changed during the update which caused the breakage?
I had a look through the kernel logs in https://github.com/longhorn/longhorn/issues/6857#issuecomment-1758393346 and it looks like a kernel problem. You probably updated to kernel-default-6.5.6-1.1.x86_64 as that was released recently. Maybe that has a problem for your particular use case. At a guess, it might be related to the use of the noresvport option as I doubt that it tested much. I'll try experimenting...
I tried to downgrade the kernel on all the nodes. But that doesn't seem to help. ~ # transactional-update shell transactional update # zypper install -y --oldpackage https://download.opensuse.org/history/20231008/tumbleweed/repo/oss/x86_64/kernel-default-6.5.4-1.1.x86_64.rpm transactional update # zypper addlock kernel-default transactional update # exit ~ # touch /var/run/reboot-required And I waited for the reboot. And the nfs-client version is still the same. ~ # zypper info nfs-client Loading repository data... Reading installed packages... Information for package nfs-client: ----------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : nfs-client Version : 2.6.3-39.5 Arch : x86_64 Vendor : openSUSE Installed Size : 874.5 KiB Installed : Yes Status : out-of-date (version 2.6.3-39.4 installed)
Hm, forget my previous comment. After all nodes restarted it looks like it's working. When I wrote my previous comment I only checked that the pod is running on an already restarted node - that uses the previous kernel. Now all my nodes are really downgraded and rebooted and it works now. So the issue occurs in kernel-default-6.5.6-1.1 and works in kernel-default-6.5.4-1.1.
I believe this bug is fixed by https://lore.kernel.org/all/20231009145901.99260-1-olga.kornievskaia@gmail.com/ I have submitted that patch so that next kernel released for tumbleweed should have the fix. Until then please use the 6.5.4 kernel.