Bug 1202330 - [Build 20220810] openQA test fails in fanotify10
Summary: [Build 20220810] openQA test fails in fanotify10
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: Other Other
: P3 - Medium : Normal (vote)
Target Milestone: ---
Assignee: Jan Kara
QA Contact: E-mail List
URL: https://openqa.opensuse.org/tests/250...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-11 11:21 UTC by Dominique Leuenberger
Modified: 2022-12-14 17:46 UTC (History)
1 user (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments
[PATCH] syscalls/fanotify10: Make evictable marks test more reliable (1.55 KB, patch)
2022-08-24 11:36 UTC, Jan Kara
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique Leuenberger 2022-08-11 11:21:13 UTC
## Observation


fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:538: TFAIL: group 0 (4) with FAN_MARK_MOUNT did not get event
fanotify10.c:538: TFAIL: group 1 (4) with FAN_MARK_MOUNT did not get event
fanotify10.c:538: TFAIL: group 2 (4) with FAN_MARK_MOUNT did not get event
fanotify10.c:538: TFAIL: group 0 (0) with FAN_MARK_MOUNT did not get event
fanotify10.c:538: TFAIL: group 1 (0) with FAN_MARK_MOUNT did not get event
fanotify10.c:538: TFAIL: group 2 (0) with FAN_MARK_MOUNT did not get event
fanotify10.c:538: TFAIL: group 0 (e00) with FAN_MARK_MOUNT did not get event
fanotify10.c:538: TFAIL: group 1 (e00) with FAN_MARK_MOUNT did not get event
fanotify10.c:538: TFAIL: group 2 (e00) with FAN_MARK_MOUNT did not get event
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:340: TFAIL: Unexpected inode mark (mflags=240, mask=8000020 ignored_mask=20)
fanotify10.c:538: TFAIL: group 0 (4) with FAN_MARK_FILESYSTEM did not get event
fanotify10.c:538: TFAIL: group 1 (4) with FAN_MARK_FILESYSTEM did not get event
fanotify10.c:538: TFAIL: group 2 (4) with FAN_MARK_FILESYSTEM did not get event
fanotify10.c:538: TFAIL: group 0 (0) with FAN_MARK_FILESYSTEM did not get event
fanotify10.c:538: TFAIL: group 1 (0) with FAN_MARK_FILESYSTEM did not get event
fanotify10.c:538: TFAIL: group 2 (0) with FAN_MARK_FILESYSTEM did not get event
fanotify10.c:538: TFAIL: group 0 (e00) with FAN_MARK_FILESYSTEM did not get event
fanotify10.c:538: TFAIL: group 1 (e00) with FAN_MARK_FILESYSTEM did not get event
fanotify10.c:538: TFAIL: group 2 (e00) with FAN_MARK_FILESYSTEM did not get event
Failed.
Test took approximately 3.23233270202763 seconds
Some test output could not be parsed: 14 lines were ignored.


openQA test in scenario opensuse-Tumbleweed-JeOS-for-kvm-and-xen-x86_64-jeos-ltp-syscalls@uefi_virtio-2G fails in
[fanotify10](https://openqa.opensuse.org/tests/2508855/modules/fanotify10/steps/7)

## Test suite description



## Reproducible

Fails since (at least) Build [20220810](https://openqa.opensuse.org/tests/2508674)


## Expected result

Last good: [20220809](https://openqa.opensuse.org/tests/2507021) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=JeOS-for-kvm-and-xen&machine=uefi_virtio-2G&test=jeos-ltp-syscalls&version=Tumbleweed)
Comment 1 Jan Kara 2022-08-16 10:37:08 UTC
This failure is due to unreliable testing of evictable ignore masks. I believe Amir works with LTP upstream to resolve the issue but let me check for progress... Hum, probably it got forgotten. The last email I've found regarding this is: https://lore.kernel.org/all/CAOQ4uxiJ2kb42XzQc8P2cZ6LKdrYNK3-P9u_cLS_WHYx4LzwzA@mail.gmail.com

I'll have a look what we can do here but in either case it is a test issue, not a bug.
Comment 2 Jan Kara 2022-08-24 11:35:31 UTC
Dominique, I have tried and I'm not able to reproduce the issue locally. The underlying problem is that for some reason "echo 3 >/proc/sys/vm/drop_caches" does not evict the inode we need. Attached patch might help it. Are you able to test it in openQA?
Comment 3 Jan Kara 2022-08-24 11:36:24 UTC
Created attachment 861026 [details]
[PATCH] syscalls/fanotify10: Make evictable marks test more reliable
Comment 4 Jan Kara 2022-08-26 14:07:49 UTC
The fix was now merged to LTP repository. So let's close this as hopefully fixed.
Comment 5 Dominique Leuenberger 2022-09-06 11:03:57 UTC
https://openqa.opensuse.org/tests/2627241#step/fanotify10/7

This seems still to be a problem - two weeks have passed; Is that 'expected'?
Comment 6 Jan Kara 2022-09-06 11:42:19 UTC
I can see my fix has been merged to upstream LTP repository on Aug 28 as commit 48cfd7a9977e ("syscalls/fanotify10: Make evictable marks test more reliable"). Looking at openQA log it already seems to be using LTP as of commit 14e31797926a which happened after that so it should indeed contain the fix. Hrm... needs more poking...
Comment 7 Jan Kara 2022-09-26 13:18:17 UTC
Dominique, I got back to this but despite my efforts I was not able to reproduce the failure in my VM and although I see in the code some possibilities how reclaim could bail out early without evicting the inode we want to evict I don't see any obvious condition that should trigger in openQA. Is it possible that I connect to openQA VM and reproduce & debug the issue there?
Comment 8 Jan Kara 2022-10-12 14:21:28 UTC
Ping Dominique?
Comment 9 Dominique Leuenberger 2022-10-13 12:57:48 UTC
(In reply to Jan Kara from comment #8)
> Ping Dominique?

Apologies, I was 'out of order' for the last 2.5 weeks

Technically, the disk image used to run the tests can be downloaded from openQA, in the relevant tests under Assets.

i.e. https://openqa.opensuse.org/tests/2797490# => 
https://openqa.opensuse.org/tests/2797490/asset/hdd/openSUSE-Tumbleweed-Minimal-VM.x86_64-1.0.0-kvm-and-xen-Snapshot20221012.qcow2

The command line used to fire up the VM can be found in autoinst.txt, in this case it recorded to be
/usr/bin/qemu-system-x86_64 -device virtio-vga,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 2048 -cpu qemu64 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -device qemu-xhci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :101,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/11/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/11/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/11/raid/pflash-vars-overlay0,unit=1

Direct access to the worker VMs might be possible with some tricks, you'd need access to ssh to the openqa host though (as a jumphost to do VNC forwarding)
Comment 10 Jan Kara 2022-11-23 12:16:05 UTC
OK, we've debugged these failures with one guy from Intel. In the end we needed to modify the LTP test (upstream inclusion pending) and also the kernel needs a fix in slab reclaim code. Both changes should eventually propagate to Tumbleweed but let me leave this bug open until changes are at least included upstream.
Comment 11 Jan Kara 2022-12-14 17:46:42 UTC
OK, the kernel patch has made it upstream (will be in 6.2-rc1) as e83b39d6bbd ("mm: make drop_caches keep reclaiming on all nodes") and LTP changes are also upstream as a series finishing by 4fefdf340fa ("fanotify10: Make evictable marks tests more reliable"). So the failures for Tumbleweed should go away soon. Closing the bug, hopefully for good ;)