Bug 1212217 - [Build 20230611] openQA test fails in snapper_create: btrfsprogs 6.3.1 behaves differently (?)
Summary: [Build 20230611] openQA test fails in snapper_create: btrfsprogs 6.3.1 behave...
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Other (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: David Sterba
QA Contact: E-mail List
URL: https://openqa.opensuse.org/tests/335...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-12 07:50 UTC by Dominique Leuenberger
Modified: 2023-10-13 20:40 UTC (History)
4 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique Leuenberger 2023-06-12 07:50:37 UTC
## Observation

btrfsprogs has been updated from version 6.3 to 6.3.1, which results in the existing openQA test failing to pass now.


The relevant part in the console output seems to be

> snapper list
> EOT_kEL1H
kEL1H-0-
# echo kEL1H; bash -oe pipefail /tmp/scriptkEL1H.sh ; echo SCRIPT_FINISHEDkEL1H-$?-
kEL1H
  # | Type   | Pre # | Date                     | User | Used Space | Cleanup        | Description                                   | Userdata                                      
----+--------+-------+--------------------------+------+------------+----------------+-----------------------------------------------+-----------------------------------------------
 0  | single |       |                          | root |            |                | current                                       |                                               
 1* | single |       | Sun Jun 11 20:36:02 2023 | root |  16.00 KiB |                | first root filesystem                         |                                               
 2  | single |       | Sun Jun 11 20:38:02 2023 | root |   3.98 MiB | number         | after installation                            | important=yes                                 
 3  | pre    |       | Sun Jun 11 21:32:46 2023 | root | 304.00 KiB | number         | zypp(zypper)                                  | important=no                                  
 4  | post   |     3 | Sun Jun 11 21:32:58 2023 | root |   1.07 MiB | number         |                                               | important=no                                  
 5  | pre    |       | Sun Jun 11 21:33:00 2023 | root |  64.00 KiB | number         | zypp(zypper)                                  | important=no                                  
 6  | post   |     5 | Sun Jun 11 21:33:00 2023 | root |   4.17 MiB | number         |                                               | important=no                                  
 7  | single |       | Sun Jun 11 21:33:21 2023 | root |  64.00 KiB |                | before undochange test                        |                                               
 8  | single |       | Sun Jun 11 21:33:29 2023 | root |  96.00 KiB |                | after undochange test                         |                                               
 9  | single |       | Sun Jun 11 21:34:05 2023 | root |  96.00 KiB |                | before undochange test                        |                                               
10  | single |       | Sun Jun 11 21:34:13 2023 | root |  96.00 KiB |                | after undochange test                         |                                               
11  | single |       | Sun Jun 11 21:35:25 2023 | root |  16.00 KiB | number         | type=single,cleanup_algorithm=number          | cleanup_algorithm=number, type=single         
12  | single |       | Sun Jun 11 21:35:27 2023 | root |  16.00 KiB | timeline       | type=single,cleanup_algorithm=timeline        | cleanup_algorithm=timeline, type=single       
13  | single |       | Sun Jun 11 21:35:28 2023 | root |  16.00 KiB | empty-pre-post | type=single,cleanup_algorithm=empty-pre-post  | cleanup_algorithm=empty-pre-post, type=single 
14  | pre    |       | Sun Jun 11 21:35:29 2023 | root |  16.00 KiB | number         | type=command,cleanup_algorithm=number         | cleanup_algorithm=number, type=command        
15  | post   |    14 | Sun Jun 11 21:35:30 2023 | root |  16.00 KiB | number         | type=command,cleanup_algorithm=number         | cleanup_algorithm=number, type=command        
16  | pre    |       | Sun Jun 11 21:35:31 2023 | root |  16.00 KiB | timeline       | type=command,cleanup_algorithm=timeline       | cleanup_algorithm=timeline, type=command      
17  | post   |    16 | Sun Jun 11 21:35:32 2023 | root |  16.00 KiB | timeline       | type=command,cleanup_algorithm=timeline       | cleanup_algorithm=timeline, type=command      
18  | pre    |       | Sun Jun 11 21:35:33 2023 | root |  16.00 KiB | empty-pre-post | type=command,cleanup_algorithm=empty-pre-post | cleanup_algorithm=empty-pre-post, type=command
19  | post   |    18 | Sun Jun 11 21:35:34 2023 | root |  16.00 KiB | empty-pre-post | type=command,cleanup_algorithm=empty-pre-post | cleanup_algorithm=empty-pre-post, type=command
20  | pre    |       | Sun Jun 11 21:35:35 2023 | root |  16.00 KiB | number         | type=pre,cleanup_algorithm=number             | cleanup_algorithm=number, type=pre            
21  | pre    |       | Sun Jun 11 21:35:39 2023 | root |  16.00 KiB | timeline       | type=pre,cleanup_algorithm=timeline           | cleanup_algorithm=timeline, type=pre          
22  | pre    |       | Sun Jun 11 21:35:42 2023 | root |  16.00 KiB | empty-pre-post | type=pre,cleanup_algorithm=empty-pre-post     | cleanup_algorithm=empty-pre-post, type=pre    
23  | post   |    22 | Sun Jun 11 21:35:45 2023 | root |  16.00 KiB | number         | type=post,cleanup_algorithm=number            | cleanup_algorithm=number, type=post           
24  | post   |    21 | Sun Jun 11 21:35:47 2023 | root |  16.00 KiB | timeline       | type=post,cleanup_algorithm=timeline          | cleanup_algorithm=timeline, type=post         
25  | post   |    20 | Sun Jun 11 21:35:48 2023 | root |  16.00 KiB | empty-pre-post | type=post,cleanup_algorithm=empty-pre-post    | cleanup_algorithm=empty-pre-post, type=post   
SCRIPT_FINISHEDkEL1H-0-
# cat > /tmp/scriptWWujw.sh << 'EOT_WWujw'; echo WWujw-$?-
> snapper list | tail -n1 | awk -F '|' '{ print $1 }' | tr -d '[:space:]*' | awk '{ print ">>>" $1 "<<<" }'
> EOT_WWujw
WWujw-0-
# echo WWujw; bash -oe pipefail /tmp/scriptWWujw.sh ; echo SCRIPT_FINISHEDWWujw-$?-
WWujw
>>>25<<<
SCRIPT_FINISHEDWWujw-0-
# snapper delete --sync 11-25; echo LL~1H-$?-
Config is in use.
LL~1H-1-





openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-extra_tests_filesystem@64bit fails in
[snapper_create](https://openqa.opensuse.org/tests/3351759/modules/snapper_create/steps/169)

## Test suite description
Maintainer: QE Core

Filesystem related tests, for example snapper and btrfs features.


## Reproducible

Fails since (at least) Build [20230611](https://openqa.opensuse.org/tests/3351759) (current job)


## Expected result

Last good: [20230610](https://openqa.opensuse.org/tests/3350495) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=extra_tests_filesystem&version=Tumbleweed)
Comment 1 David Sterba 2023-08-28 14:40:47 UTC
This bug is blocking btrfsprogs package updates but I haven't found any commit or other reason that would cause the openQA test failure. Is it possible to get fresh test results? I can't find a recent version of the opensuse-Tumbleweed-DVD-x86_64 build job on openqa.opensuse.org, the Tumbleweed tests don't have the snapper tests.
Comment 2 Dominique Leuenberger 2023-08-28 15:38:34 UTC
(In reply to David Sterba from comment #1)
> This bug is blocking btrfsprogs package updates but I haven't found any
> commit or other reason that would cause the openQA test failure. Is it
> possible to get fresh test results? I can't find a recent version of the
> opensuse-Tumbleweed-DVD-x86_64 build job on openqa.opensuse.org, the
> Tumbleweed tests don't have the snapper tests.

https://openqa.opensuse.org/tests/3526753#step/snapper_create/1

This is btrfsprogs 6.3 - 6.3.1 had been reverted due to bug https://bugzilla.opensuse.org/show_bug.cgi?id=1212217
Comment 3 Dominique Leuenberger 2023-08-28 15:40:00 UTC
(In reply to Dominique Leuenberger from comment #2)
> (In reply to David Sterba from comment #1)
> > This bug is blocking btrfsprogs package updates but I haven't found any
> > commit or other reason that would cause the openQA test failure. Is it
> > possible to get fresh test results? I can't find a recent version of the
> > opensuse-Tumbleweed-DVD-x86_64 build job on openqa.opensuse.org, the
> > Tumbleweed tests don't have the snapper tests.
> 
> https://openqa.opensuse.org/tests/3526753#step/snapper_create/1
> 
> This is btrfsprogs 6.3 - 6.3.1 had been reverted due to bug
> https://bugzilla.opensuse.org/show_bug.cgi?id=1212217

erm.. that is circular :)

The test runs on openQA, that';s for sure. We can tr to clone that job with an external addon repo injected to test with btrfsprogs 6.3.1 - fact is though: reverting btrfsprogs alone made the tests pass again
Comment 4 Dominique Leuenberger 2023-08-28 15:42:11 UTC
or optionally - grab the qcow2 image linked in https://openqa.opensuse.org/tests/3526753#step/snapper_create/1 assets

you can boot this, update btrfsprogs and re-inject this qcow into a local openQA instance to let it rerun the tests - or try to replay what the test does (there is of course always the option of the test doing something wrong)
Comment 5 Dominique Leuenberger 2023-09-01 07:10:26 UTC
btrfsprogs 6.3.3 was checked in to Factory - and openQA fails again:

https://openqa.opensuse.org/tests/3541566#step/snapper_create/172
Comment 6 Dominique Leuenberger 2023-09-01 07:13:33 UTC
Also 
https://openqa.opensuse.org/tests/3541586#step/snapper_create/172
Comment 7 Fabian Vogt 2023-09-01 09:17:57 UTC
libbtrfs0 broke ABI, so snapper does garbage. Commit 83ab92512e79a5f8e3d2afd06e0c28d41061fd32 needs to be redone or the soversion bumped.

In /var/log/snapper.log, it's visible that btrfs send/receive ioctls for snapshot diffing failed with ENOENT, so snapper used the fallback of manually comparing the files inside. This takes much longer, so races resulting in errors like "Config in use" are more likely.

With strace it was visible that snapper passed parent_root=0 to BTRFS_IOC_SEND, resulting in -ENOENT. With gdb I stepped through the code and noticed that the contents of struct subvol_info returned to snapper are at an offset, so the root_id member was not where snapper expected it to be, so it read 0 instead.
Comment 8 Wenruo Qu 2023-09-01 09:42:41 UTC
Fabian is right, I didn't notice the change of subvol_info is at the beginning of the structure.

Thus it would lead to a incompatible change of API.

Would anyone mind to test if the following patch solves the regression?
https://lore.kernel.org/linux-btrfs/4d54ca8fb8605704260aad205acd6185fe73fb49.1693561063.git.wqu@suse.com/T/#u
Comment 9 Fabian Vogt 2023-09-01 10:02:16 UTC
(In reply to Wenruo Qu from comment #8)
> Fabian is right, I didn't notice the change of subvol_info is at the
> beginning of the structure.
> 
> Thus it would lead to a incompatible change of API.
> 
> Would anyone mind to test if the following patch solves the regression?
> https://lore.kernel.org/linux-btrfs/4d54ca8fb8605704260aad205acd6185fe73fb49.
> 1693561063.git.wqu@suse.com/T/#u

It does. Is it intentional that the members of subvol_uuid_search are now prefixed with "__"? This breaks API and the "__" prefix is also invalid according to the C standard as such identifiers are reserved by the standard.
Comment 10 David Sterba 2023-09-01 10:10:48 UTC
Thanks Fabian for tracking it down. I'll do a full revert of the patch so the ABI is on the 6.3 level again.
Comment 11 Wenruo Qu 2023-09-01 10:23:59 UTC
(In reply to Fabian Vogt from comment #9)
> (In reply to Wenruo Qu from comment #8)
> > Fabian is right, I didn't notice the change of subvol_info is at the
> > beginning of the structure.
> > 
> > Thus it would lead to a incompatible change of API.
> > 
> > Would anyone mind to test if the following patch solves the regression?
> > https://lore.kernel.org/linux-btrfs/4d54ca8fb8605704260aad205acd6185fe73fb49.
> > 1693561063.git.wqu@suse.com/T/#u
> 
> It does. Is it intentional that the members of subvol_uuid_search are now
> prefixed with "__"? This breaks API and the "__" prefix is also invalid
> according to the C standard as such identifiers are reserved by the standard.

It was to discourage any usage of the old members.

But since David is going to revert that one, it doesn't matter anymore.

Thanks for pinning down the breakage.
Comment 12 David Sterba 2023-09-01 13:21:01 UTC
Version 6.5 has the fix and is now in the OBS pipeline.
Comment 13 Dominique Leuenberger 2023-10-12 12:49:46 UTC
(In reply to David Sterba from comment #12)
> Version 6.5 has the fix and is now in the OBS pipeline.

this has been fixed; thanks all for working on this and identifying the underlying ABI break.