Bug 1211898

Summary: stress-ng fsize failure on NFSv3
Product: [openSUSE] openSUSE Tumbleweed Reporter: Richard Palethorpe <richard.palethorpe>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: NEW --- QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: nfbrown
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Richard Palethorpe 2023-06-01 09:12:40 UTC
While setting up some OpenQA NFS testing I came across the following error with stress-ng.

$ stress-ng --sequential -1 --timeout 3 --class filesystem
...
stress-ng: fail:  [3866] fsize: fallocate unexpectedly succeeded at offset 262144 (0x40000), expecting EFBIG error
stress-ng: fail:  [3866] fsize: expected a SIGXFSZ signal at offset 262144 (0x40000), nothing happened
stress-ng: info:  [3866] fsize: fallocate unexpectedly succeeded at offset 106797 (0x1a12d), expecting EFBIG error
stress-ng: info:  [3866] fsize: fallocate unexpectedly succeeded at offset 1 (0x1), expecting EFBIG error
stress-ng: info:  [3866] fsize: fallocate unexpectedly succeeded at offset 3 (0x3), expecting EFBIG error
stress-ng: info:  [3866] fsize: fallocate unexpectedly succeeded at offset 7 (0x7), expecting EFBIG error
stress-ng: info:  [3866] fsize: fallocate unexpectedly succeeded at offset 15 (0xf), expecting EFBIG error
stress-ng: info:  [3866] fsize: fallocate unexpectedly succeeded at offset 31 (0x1f), expecting EFBIG error
stress-ng: info:  [3866] fsize: fallocate unexpectedly succeeded at offset 63 (0x3f), expecting EFBIG error
...

This does not happen with NFSv4. It happens with both NFSv3 sync and async. I don't see any other errors.

The tests have not been merged or scheduled on the main OpenQA instance yet. When they are I can post a link. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17181
Comment 1 Neil Brown 2023-06-26 06:55:42 UTC
The timeout setting it too small.  Use a bigger number.

NFSv3 (and NFSv4.1, but not NFSv4.2) does not implement fallocate().
So stress-ng uses a "shim_emulate_fallocate()" instead, which writes data.

shim_emulate_fallocate() stops trying to write if keep_stressing_flag() fails.
One of the things that causes this to fail is when a SIGALARM is delivered, which happens after the timeout.

So after the timeout, a fallocate attempt will appear to succeed.  This is arguably a bug in stress-ng.

NFS is behaving correctly.  stress-ng is not getting an error, because it the way is emulates fallocate is not reliable.