Bugzilla – Bug 1212217
[Build 20230611] openQA test fails in snapper_create: btrfsprogs 6.3.1 behaves differently (?)
Last modified: 2023-10-13 20:40:10 UTC
## Observation btrfsprogs has been updated from version 6.3 to 6.3.1, which results in the existing openQA test failing to pass now. The relevant part in the console output seems to be > snapper list > EOT_kEL1H kEL1H-0- # echo kEL1H; bash -oe pipefail /tmp/scriptkEL1H.sh ; echo SCRIPT_FINISHEDkEL1H-$?- kEL1H # | Type | Pre # | Date | User | Used Space | Cleanup | Description | Userdata ----+--------+-------+--------------------------+------+------------+----------------+-----------------------------------------------+----------------------------------------------- 0 | single | | | root | | | current | 1* | single | | Sun Jun 11 20:36:02 2023 | root | 16.00 KiB | | first root filesystem | 2 | single | | Sun Jun 11 20:38:02 2023 | root | 3.98 MiB | number | after installation | important=yes 3 | pre | | Sun Jun 11 21:32:46 2023 | root | 304.00 KiB | number | zypp(zypper) | important=no 4 | post | 3 | Sun Jun 11 21:32:58 2023 | root | 1.07 MiB | number | | important=no 5 | pre | | Sun Jun 11 21:33:00 2023 | root | 64.00 KiB | number | zypp(zypper) | important=no 6 | post | 5 | Sun Jun 11 21:33:00 2023 | root | 4.17 MiB | number | | important=no 7 | single | | Sun Jun 11 21:33:21 2023 | root | 64.00 KiB | | before undochange test | 8 | single | | Sun Jun 11 21:33:29 2023 | root | 96.00 KiB | | after undochange test | 9 | single | | Sun Jun 11 21:34:05 2023 | root | 96.00 KiB | | before undochange test | 10 | single | | Sun Jun 11 21:34:13 2023 | root | 96.00 KiB | | after undochange test | 11 | single | | Sun Jun 11 21:35:25 2023 | root | 16.00 KiB | number | type=single,cleanup_algorithm=number | cleanup_algorithm=number, type=single 12 | single | | Sun Jun 11 21:35:27 2023 | root | 16.00 KiB | timeline | type=single,cleanup_algorithm=timeline | cleanup_algorithm=timeline, type=single 13 | single | | Sun Jun 11 21:35:28 2023 | root | 16.00 KiB | empty-pre-post | type=single,cleanup_algorithm=empty-pre-post | cleanup_algorithm=empty-pre-post, type=single 14 | pre | | Sun Jun 11 21:35:29 2023 | root | 16.00 KiB | number | type=command,cleanup_algorithm=number | cleanup_algorithm=number, type=command 15 | post | 14 | Sun Jun 11 21:35:30 2023 | root | 16.00 KiB | number | type=command,cleanup_algorithm=number | cleanup_algorithm=number, type=command 16 | pre | | Sun Jun 11 21:35:31 2023 | root | 16.00 KiB | timeline | type=command,cleanup_algorithm=timeline | cleanup_algorithm=timeline, type=command 17 | post | 16 | Sun Jun 11 21:35:32 2023 | root | 16.00 KiB | timeline | type=command,cleanup_algorithm=timeline | cleanup_algorithm=timeline, type=command 18 | pre | | Sun Jun 11 21:35:33 2023 | root | 16.00 KiB | empty-pre-post | type=command,cleanup_algorithm=empty-pre-post | cleanup_algorithm=empty-pre-post, type=command 19 | post | 18 | Sun Jun 11 21:35:34 2023 | root | 16.00 KiB | empty-pre-post | type=command,cleanup_algorithm=empty-pre-post | cleanup_algorithm=empty-pre-post, type=command 20 | pre | | Sun Jun 11 21:35:35 2023 | root | 16.00 KiB | number | type=pre,cleanup_algorithm=number | cleanup_algorithm=number, type=pre 21 | pre | | Sun Jun 11 21:35:39 2023 | root | 16.00 KiB | timeline | type=pre,cleanup_algorithm=timeline | cleanup_algorithm=timeline, type=pre 22 | pre | | Sun Jun 11 21:35:42 2023 | root | 16.00 KiB | empty-pre-post | type=pre,cleanup_algorithm=empty-pre-post | cleanup_algorithm=empty-pre-post, type=pre 23 | post | 22 | Sun Jun 11 21:35:45 2023 | root | 16.00 KiB | number | type=post,cleanup_algorithm=number | cleanup_algorithm=number, type=post 24 | post | 21 | Sun Jun 11 21:35:47 2023 | root | 16.00 KiB | timeline | type=post,cleanup_algorithm=timeline | cleanup_algorithm=timeline, type=post 25 | post | 20 | Sun Jun 11 21:35:48 2023 | root | 16.00 KiB | empty-pre-post | type=post,cleanup_algorithm=empty-pre-post | cleanup_algorithm=empty-pre-post, type=post SCRIPT_FINISHEDkEL1H-0- # cat > /tmp/scriptWWujw.sh << 'EOT_WWujw'; echo WWujw-$?- > snapper list | tail -n1 | awk -F '|' '{ print $1 }' | tr -d '[:space:]*' | awk '{ print ">>>" $1 "<<<" }' > EOT_WWujw WWujw-0- # echo WWujw; bash -oe pipefail /tmp/scriptWWujw.sh ; echo SCRIPT_FINISHEDWWujw-$?- WWujw >>>25<<< SCRIPT_FINISHEDWWujw-0- # snapper delete --sync 11-25; echo LL~1H-$?- Config is in use. LL~1H-1- openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-extra_tests_filesystem@64bit fails in [snapper_create](https://openqa.opensuse.org/tests/3351759/modules/snapper_create/steps/169) ## Test suite description Maintainer: QE Core Filesystem related tests, for example snapper and btrfs features. ## Reproducible Fails since (at least) Build [20230611](https://openqa.opensuse.org/tests/3351759) (current job) ## Expected result Last good: [20230610](https://openqa.opensuse.org/tests/3350495) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=extra_tests_filesystem&version=Tumbleweed)
This bug is blocking btrfsprogs package updates but I haven't found any commit or other reason that would cause the openQA test failure. Is it possible to get fresh test results? I can't find a recent version of the opensuse-Tumbleweed-DVD-x86_64 build job on openqa.opensuse.org, the Tumbleweed tests don't have the snapper tests.
(In reply to David Sterba from comment #1) > This bug is blocking btrfsprogs package updates but I haven't found any > commit or other reason that would cause the openQA test failure. Is it > possible to get fresh test results? I can't find a recent version of the > opensuse-Tumbleweed-DVD-x86_64 build job on openqa.opensuse.org, the > Tumbleweed tests don't have the snapper tests. https://openqa.opensuse.org/tests/3526753#step/snapper_create/1 This is btrfsprogs 6.3 - 6.3.1 had been reverted due to bug https://bugzilla.opensuse.org/show_bug.cgi?id=1212217
(In reply to Dominique Leuenberger from comment #2) > (In reply to David Sterba from comment #1) > > This bug is blocking btrfsprogs package updates but I haven't found any > > commit or other reason that would cause the openQA test failure. Is it > > possible to get fresh test results? I can't find a recent version of the > > opensuse-Tumbleweed-DVD-x86_64 build job on openqa.opensuse.org, the > > Tumbleweed tests don't have the snapper tests. > > https://openqa.opensuse.org/tests/3526753#step/snapper_create/1 > > This is btrfsprogs 6.3 - 6.3.1 had been reverted due to bug > https://bugzilla.opensuse.org/show_bug.cgi?id=1212217 erm.. that is circular :) The test runs on openQA, that';s for sure. We can tr to clone that job with an external addon repo injected to test with btrfsprogs 6.3.1 - fact is though: reverting btrfsprogs alone made the tests pass again
or optionally - grab the qcow2 image linked in https://openqa.opensuse.org/tests/3526753#step/snapper_create/1 assets you can boot this, update btrfsprogs and re-inject this qcow into a local openQA instance to let it rerun the tests - or try to replay what the test does (there is of course always the option of the test doing something wrong)
btrfsprogs 6.3.3 was checked in to Factory - and openQA fails again: https://openqa.opensuse.org/tests/3541566#step/snapper_create/172
Also https://openqa.opensuse.org/tests/3541586#step/snapper_create/172
libbtrfs0 broke ABI, so snapper does garbage. Commit 83ab92512e79a5f8e3d2afd06e0c28d41061fd32 needs to be redone or the soversion bumped. In /var/log/snapper.log, it's visible that btrfs send/receive ioctls for snapshot diffing failed with ENOENT, so snapper used the fallback of manually comparing the files inside. This takes much longer, so races resulting in errors like "Config in use" are more likely. With strace it was visible that snapper passed parent_root=0 to BTRFS_IOC_SEND, resulting in -ENOENT. With gdb I stepped through the code and noticed that the contents of struct subvol_info returned to snapper are at an offset, so the root_id member was not where snapper expected it to be, so it read 0 instead.
Fabian is right, I didn't notice the change of subvol_info is at the beginning of the structure. Thus it would lead to a incompatible change of API. Would anyone mind to test if the following patch solves the regression? https://lore.kernel.org/linux-btrfs/4d54ca8fb8605704260aad205acd6185fe73fb49.1693561063.git.wqu@suse.com/T/#u
(In reply to Wenruo Qu from comment #8) > Fabian is right, I didn't notice the change of subvol_info is at the > beginning of the structure. > > Thus it would lead to a incompatible change of API. > > Would anyone mind to test if the following patch solves the regression? > https://lore.kernel.org/linux-btrfs/4d54ca8fb8605704260aad205acd6185fe73fb49. > 1693561063.git.wqu@suse.com/T/#u It does. Is it intentional that the members of subvol_uuid_search are now prefixed with "__"? This breaks API and the "__" prefix is also invalid according to the C standard as such identifiers are reserved by the standard.
Thanks Fabian for tracking it down. I'll do a full revert of the patch so the ABI is on the 6.3 level again.
(In reply to Fabian Vogt from comment #9) > (In reply to Wenruo Qu from comment #8) > > Fabian is right, I didn't notice the change of subvol_info is at the > > beginning of the structure. > > > > Thus it would lead to a incompatible change of API. > > > > Would anyone mind to test if the following patch solves the regression? > > https://lore.kernel.org/linux-btrfs/4d54ca8fb8605704260aad205acd6185fe73fb49. > > 1693561063.git.wqu@suse.com/T/#u > > It does. Is it intentional that the members of subvol_uuid_search are now > prefixed with "__"? This breaks API and the "__" prefix is also invalid > according to the C standard as such identifiers are reserved by the standard. It was to discourage any usage of the old members. But since David is going to revert that one, it doesn't matter anymore. Thanks for pinning down the breakage.
Version 6.5 has the fix and is now in the OBS pipeline.
(In reply to David Sterba from comment #12) > Version 6.5 has the fix and is now in the OBS pipeline. this has been fixed; thanks all for working on this and identifying the underlying ABI break.