Bug 1212906 - fanotify23 randomly fails on Tumbleweed when run more times
Summary: fanotify23 randomly fails on Tumbleweed when run more times
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Jan Kara
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-30 21:09 UTC by Petr Vorel
Modified: 2023-08-14 13:20 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
[PATCH] fanotify23: Make evictable marks tests more reliable (2.14 KB, patch)
2023-08-10 13:07 UTC, Jan Kara
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Vorel 2023-06-30 21:09:53 UTC
LTP fanotify23 test (test for FAN_MARK_EVICTABLE from 5.19) fails, when run more times. I tested it on all archs on the current Tumbleweed kernel [2], it fails everywhere.

It also fails when tested on VM with older 6.2.8-1-default kernel (where it mostly fails, I guess the behavior is the same on all recent version):

# /opt/ltp/testcases/bin/fanotify23 -i5
tst_device.c:97: TINFO: Found free device 0 '/dev/loop0'
tst_test.c:1094: TINFO: Formatting /dev/loop0 with ext2 opts='' extra opts=''
mke2fs 1.46.5 (30-Dec-2021)
tst_test.c:1560: TINFO: Timeout per run is 0h 00m 30s
fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
fanotify23.c:156: TPASS: Got no events as expected
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after drop_caches
fanotify23.c:193: TPASS: got event: mask=4
fanotify23.c:193: TPASS: got event: mask=4
fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
fanotify23.c:156: TPASS: Got no events as expected
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after drop_caches
fanotify23.c:193: TPASS: got event: mask=4
fanotify23.c:193: TPASS: got event: mask=4
fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
fanotify23.c:156: TPASS: Got no events as expected
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after drop_caches
fanotify23.c:193: TPASS: got event: mask=4
fanotify23.c:193: TPASS: got event: mask=4
fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
fanotify23.c:156: TPASS: Got no events as expected
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after drop_caches
fanotify23.c:193: TPASS: got event: mask=4
fanotify23.c:193: TPASS: got event: mask=4
fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
fanotify23.c:156: TPASS: Got no events as expected
fanotify23.c:81: TFAIL: FAN_MARK_REMOVE did not fail with ENOENT as expected after drop_caches: SUCCESS (0)
fanotify23.c:175: TBROK: read(3,0x5598cbfafc54,24524) failed, returned -1: EAGAIN/EWOULDBLOCK (11)

Summary:
passed   27
failed   1
broken   1
skipped  0
warnings 0

dmesg:
[  371.377308][T29983] loop: module loaded
[  371.380510][T29982] loop0: detected capacity change from 0 to 614400
[  371.441931][    C1] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
[  371.443816][    C1] clocksource:                       'kvm-clock' wd_nsec: 515886747 wd_now: 5749149153 wd_last: 572a54c2b8 mask: ffffffffffffffff
[  371.445730][    C1] clocksource:                       'tsc' cs_nsec: 30542623402 cs_now: cff3ae7e60 cs_last: c057e04270 mask: ffffffffffffffff
[  371.447768][    C1] clocksource:                       Clocksource 'tsc' skewed 30026736655 ns (30026 ms) over watchdog 'kvm-clock' interval of 515886747 ns (515 ms)
[  371.449822][    C1] clocksource:                       'kvm-clock' (not 'tsc') is current clocksource.
[  371.451068][    C1] tsc: Marking TSC unstable due to clocksource watchdog
[  371.499239][    C1] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.501256][    C0] operation not supported error, dev loop0, sector 524 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.503893][    C1] operation not supported error, dev loop0, sector 16908 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.506135][    C0] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.508716][    C1] operation not supported error, dev loop0, sector 49676 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.511513][    C0] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.514038][    C0] operation not supported error, dev loop0, sector 82444 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.516522][    C1] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.519263][    C0] operation not supported error, dev loop0, sector 115212 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.521950][    C1] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 2
[  371.545611][T29982] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
[  371.548719][T29982] EXT4-fs (loop0): mounted filesystem f9e6e295-a78d-4551-aff2-ce1663458053 without journal. Quota mode: none.
[  371.787570][T30005] fanotify23 (30005): drop_caches: 3
[  371.804729][T30005] EXT4-fs (loop0): unmounting filesystem f9e6e295-a78d-4551-aff2-ce1663458053.
[  371.810253][T30005] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
[  371.812114][T30005] EXT4-fs (loop0): mounted filesystem f9e6e295-a78d-4551-aff2-ce1663458053 without journal. Quota mode: none.
[  371.820273][T30005] fanotify23 (30005): drop_caches: 3
[  371.828227][T30005] EXT4-fs (loop0): unmounting filesystem f9e6e295-a78d-4551-aff2-ce1663458053.
[  371.830160][T30005] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
[  371.831957][T30005] EXT4-fs (loop0): mounted filesystem f9e6e295-a78d-4551-aff2-ce1663458053 without journal. Quota mode: none.
[  371.841016][T30005] fanotify23 (30005): drop_caches: 3
[  371.862288][T30005] EXT4-fs (loop0): unmounting filesystem f9e6e295-a78d-4551-aff2-ce1663458053.
[  371.864165][T30005] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
[  371.866232][T30005] EXT4-fs (loop0): mounted filesystem f9e6e295-a78d-4551-aff2-ce1663458053 without journal. Quota mode: none.
[  371.872592][T30005] fanotify23 (30005): drop_caches: 3
[  371.890441][T30005] EXT4-fs (loop0): unmounting filesystem f9e6e295-a78d-4551-aff2-ce1663458053.
[  371.892294][T30005] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
[  371.894159][T30005] EXT4-fs (loop0): mounted filesystem f9e6e295-a78d-4551-aff2-ce1663458053 without journal. Quota mode: none.
[  371.900270][T30005] fanotify23 (30005): drop_caches: 3
[  371.919626][T29982] EXT4-fs (loop0): unmounting filesystem f9e6e295-a78d-4551-aff2-ce1663458053.

When running on Debian 6.3.0-1-amd64 test does not fail and dmesg does not contain clocksource and tsc, therefore I suspect it's something our kernel specific (our configuration specific? We don't have any fanotify related patch, only something old patches.suse/vfs-add-super_operations-get_inode_dev, which is IMHO unrelated). I can report it to linux-fsdevel ML if it's a general problem. 

[1] https://github.com/linux-test-project/ltp/blob/353f0e4203139f3fc02fa2beb67cdcc7a05298d0/testcases/kernel/syscalls/fanotify/fanotify23.c
[2] https://openqa.opensuse.org/tests/overview?build=fanotify23-10x
Comment 1 Petr Vorel 2023-06-30 21:21:52 UTC
Actually, the previous dmesg was from recent Tumbleweed x86_64 6.3.9-1-default [1].
When running on some Tumbleweed VM with 6.2.8-1-default, I don't see clocksource and tsc in dmesg either and it fails the same. There is actually no difference in dmesg between failing and passing test (only passing test has also unmount related messages, but that's not relevant).

[1] https://openqa.opensuse.org/tests/3397531/file/serial0.txt
Comment 2 Jan Kara 2023-08-10 11:57:08 UTC
Hum, very likely this is again the problem with unreliability of slab reclaim on which this test depends. In bug 1202330 we've fixed this for fanotify10 test (and made reclaim somewhat more reliable in the kernel) but fanotify23 test may need a similar treatment. Let me check...
Comment 3 Jan Kara 2023-08-10 12:12:48 UTC
So indeed I can reproduce the issue (fails on 4th iteration) in my test VM. Looking into it.
Comment 4 Jan Kara 2023-08-10 13:07:10 UTC
Indeed it was the same problem. I'll attach a fixup for the LTP test.
Comment 5 Jan Kara 2023-08-10 13:07:56 UTC
Created attachment 868747 [details]
[PATCH] fanotify23: Make evictable marks tests more reliable
Comment 6 Jan Kara 2023-08-14 13:20:10 UTC
OK, v2 of the patch got accepted into LTP upstream. So let me close the bug.