Bugzilla – Bug 1202330
[Build 20220810] openQA test fails in fanotify10
Last modified: 2022-12-14 17:46:42 UTC
## Observation fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:538: TFAIL: group 0 (4) with FAN_MARK_MOUNT did not get event fanotify10.c:538: TFAIL: group 1 (4) with FAN_MARK_MOUNT did not get event fanotify10.c:538: TFAIL: group 2 (4) with FAN_MARK_MOUNT did not get event fanotify10.c:538: TFAIL: group 0 (0) with FAN_MARK_MOUNT did not get event fanotify10.c:538: TFAIL: group 1 (0) with FAN_MARK_MOUNT did not get event fanotify10.c:538: TFAIL: group 2 (0) with FAN_MARK_MOUNT did not get event fanotify10.c:538: TFAIL: group 0 (e00) with FAN_MARK_MOUNT did not get event fanotify10.c:538: TFAIL: group 1 (e00) with FAN_MARK_MOUNT did not get event fanotify10.c:538: TFAIL: group 2 (e00) with FAN_MARK_MOUNT did not get event fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20) fanotify10.c:538: TFAIL: group 0 (4) with FAN_MARK_FILESYSTEM did not get event fanotify10.c:538: TFAIL: group 1 (4) with FAN_MARK_FILESYSTEM did not get event fanotify10.c:538: TFAIL: group 2 (4) with FAN_MARK_FILESYSTEM did not get event fanotify10.c:538: TFAIL: group 0 (0) with FAN_MARK_FILESYSTEM did not get event fanotify10.c:538: TFAIL: group 1 (0) with FAN_MARK_FILESYSTEM did not get event fanotify10.c:538: TFAIL: group 2 (0) with FAN_MARK_FILESYSTEM did not get event fanotify10.c:538: TFAIL: group 0 (e00) with FAN_MARK_FILESYSTEM did not get event fanotify10.c:538: TFAIL: group 1 (e00) with FAN_MARK_FILESYSTEM did not get event fanotify10.c:538: TFAIL: group 2 (e00) with FAN_MARK_FILESYSTEM did not get event Failed. Test took approximately 3.23233270202763 seconds Some test output could not be parsed: 14 lines were ignored. openQA test in scenario opensuse-Tumbleweed-JeOS-for-kvm-and-xen-x86_64-jeos-ltp-syscalls@uefi_virtio-2G fails in [fanotify10](https://openqa.opensuse.org/tests/2508855/modules/fanotify10/steps/7) ## Test suite description ## Reproducible Fails since (at least) Build [20220810](https://openqa.opensuse.org/tests/2508674) ## Expected result Last good: [20220809](https://openqa.opensuse.org/tests/2507021) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=JeOS-for-kvm-and-xen&machine=uefi_virtio-2G&test=jeos-ltp-syscalls&version=Tumbleweed)
This failure is due to unreliable testing of evictable ignore masks. I believe Amir works with LTP upstream to resolve the issue but let me check for progress... Hum, probably it got forgotten. The last email I've found regarding this is: https://lore.kernel.org/all/CAOQ4uxiJ2kb42XzQc8P2cZ6LKdrYNK3-P9u_cLS_WHYx4LzwzA@mail.gmail.com I'll have a look what we can do here but in either case it is a test issue, not a bug.
Dominique, I have tried and I'm not able to reproduce the issue locally. The underlying problem is that for some reason "echo 3 >/proc/sys/vm/drop_caches" does not evict the inode we need. Attached patch might help it. Are you able to test it in openQA?
Created attachment 861026 [details] [PATCH] syscalls/fanotify10: Make evictable marks test more reliable
The fix was now merged to LTP repository. So let's close this as hopefully fixed.
https://openqa.opensuse.org/tests/2627241#step/fanotify10/7 This seems still to be a problem - two weeks have passed; Is that 'expected'?
I can see my fix has been merged to upstream LTP repository on Aug 28 as commit 48cfd7a9977e ("syscalls/fanotify10: Make evictable marks test more reliable"). Looking at openQA log it already seems to be using LTP as of commit 14e31797926a which happened after that so it should indeed contain the fix. Hrm... needs more poking...
Dominique, I got back to this but despite my efforts I was not able to reproduce the failure in my VM and although I see in the code some possibilities how reclaim could bail out early without evicting the inode we want to evict I don't see any obvious condition that should trigger in openQA. Is it possible that I connect to openQA VM and reproduce & debug the issue there?
Ping Dominique?
(In reply to Jan Kara from comment #8) > Ping Dominique? Apologies, I was 'out of order' for the last 2.5 weeks Technically, the disk image used to run the tests can be downloaded from openQA, in the relevant tests under Assets. i.e. https://openqa.opensuse.org/tests/2797490# => https://openqa.opensuse.org/tests/2797490/asset/hdd/openSUSE-Tumbleweed-Minimal-VM.x86_64-1.0.0-kvm-and-xen-Snapshot20221012.qcow2 The command line used to fire up the VM can be found in autoinst.txt, in this case it recorded to be /usr/bin/qemu-system-x86_64 -device virtio-vga,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 2048 -cpu qemu64 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -device qemu-xhci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :101,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/11/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/11/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/11/raid/pflash-vars-overlay0,unit=1 Direct access to the worker VMs might be possible with some tricks, you'd need access to ssh to the openqa host though (as a jumphost to do VNC forwarding)
OK, we've debugged these failures with one guy from Intel. In the end we needed to modify the LTP test (upstream inclusion pending) and also the kernel needs a fix in slab reclaim code. Both changes should eventually propagate to Tumbleweed but let me leave this bug open until changes are at least included upstream.
OK, the kernel patch has made it upstream (will be in 6.2-rc1) as e83b39d6bbd ("mm: make drop_caches keep reclaiming on all nodes") and LTP changes are also upstream as a series finishing by 4fefdf340fa ("fanotify10: Make evictable marks tests more reliable"). So the failures for Tumbleweed should go away soon. Closing the bug, hopefully for good ;)