Bug 1196123

Summary: [Build 98.1] openQA test fails in yast2_bootloader - Probing file system with UUID xx failed.
Product: [openSUSE] PUBLIC SUSE Linux Enterprise Server 15 SP4 Reporter: WEI GAO <wegao>
Component: BasesystemAssignee: Ancor Gonzalez Sosa <ancor>
Status: VERIFIED FIXED QA Contact:
Severity: Normal    
Priority: P2 - High CC: ancor, aschnell, bchou, chcao, hluo, jkowalczyk, leli, richard.fan, rtsvetkov, wegao, ysun, yuwang
Version: unspecified   
Target Milestone: ---   
Hardware: S/390-64   
OS: Other   
URL: https://openqa.suse.de/tests/8181666/modules/yast2_bootloader/steps/10
See Also: https://bugzilla.suse.com/show_bug.cgi?id=1196326
Whiteboard:
Found By: openQA Services Priority:
Business Priority: Blocker: Yes
Marketing QA Status: --- IT Deployment: ---
Attachments: parta
b
c

Description WEI GAO 2022-02-18 03:15:48 UTC
Created attachment 856284 [details]
parta

## Observation

openQA test in scenario sle-15-SP4-Regression-on-Migration-from-SLE15-SPx-s390x-offline_sles15sp3_pscc_basesys-srv-desk-dev-contm-lgm-tsm-wsm-pcm_all_full_console@s390x-kvm-sle12 fails in
[yast2_bootloader](https://openqa.suse.de/tests/8181666/modules/yast2_bootloader/steps/10)

## Test suite description
After migration run yast2_bootloader, error popup

2022-02-17 09:05:06 <3> susetest(6137) [libstorage] CallbacksImpl.cc(error_callback):183 CAUGHT:	 Command failed: "/usr/bin/lsattr -d '/tmp/libstorage-0BdG4e/tmp-mount-QnwgYf/@'"
2022-02-17 09:05:06 <1> susetest(6137) [Ruby] callbacks/issues_callback.rb(error):57 libstorage-ng reported an error, generating an issue
2022-02-17 09:05:06 <1> susetest(6137) [Ruby] callbacks/issues_callback.rb(error):58 Error details. Message: Probing file system with UUID b83b59d6-e566-4b9f-9df4-64c542d7bb73 failed. What: Command failed: "/usr/bin/lsattr -d '/tmp/libstorage-0BdG4e/tmp-mount-QnwgYf/@'".
2022-02-17 09:05:06 <1> susetest(6137) [libstorage] CallbacksImpl.cc(error_callback):193 user decides to continue after error


## Reproducible

Fails since (at least) Build [98.1](https://openqa.suse.de/tests/8181666) (current job)


## Expected result

Last good: (unknown) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Regression-on-Migration-from-SLE15-SPx&machine=s390x-kvm-sle12&test=offline_sles15sp3_pscc_basesys-srv-desk-dev-contm-lgm-tsm-wsm-pcm_all_full_console&version=15-SP4)
Comment 1 WEI GAO 2022-02-18 03:16:10 UTC
Created attachment 856285 [details]
b
Comment 2 WEI GAO 2022-02-18 03:16:28 UTC
Created attachment 856286 [details]
c
Comment 3 Ancor Gonzalez Sosa 2022-02-18 09:14:25 UTC
libstorage-ng is reporting an error during probing. Arvin will likely be able to quickly spot why:

> SystemCmd("/usr/bin/lsattr -d '/tmp/libstorage-0BdG4e/tmp-mount-QnwgYf/@'")
> SystemCmd.cc(addLine):569 Adding Line 1 "runtime/cgo: pthread_create failed:
>     Resource temporarily unavailable"
> SystemCmd.cc(getUntilEOF):535 pid:6427 added lines:1 stderr:true
> 
> THROW:  Command failed: "/usr/bin/lsattr -d '/tmp/libstorage-0BdG4e/tmp-mount-QnwgYf/@'"
> CAUGHT: Command failed: "/usr/bin/lsattr -d '/tmp/libstorage-0BdG4e/tmp-mount-QnwgYf/@'"
> RETHROW:Command failed: "/usr/bin/lsattr -d '/tmp/libstorage-0BdG4e/tmp-mount-QnwgYf/@'"
> 
> MountableImpl.cc(~EnsureMounted):619 ~EnsureMounted BtrfsSubvolume sid:52
>    displayname:'top level' id:5 path: default-btrfs-subvolume:true
Comment 4 Arvin Schnell 2022-02-21 14:27:26 UTC
That is a strange error message. It looks like a problem that can happen with
go when the system is under stress: https://github.com/golang/go/issues/24484
But where is go here in the loop?

Is the system under stress during the failure? What does /bin/sh finally point
to?
Comment 5 WEI GAO 2022-02-23 03:59:45 UTC
(In reply to Arvin Schnell from comment #4)
> That is a strange error message. It looks like a problem that can happen with
> go when the system is under stress: https://github.com/golang/go/issues/24484
> But where is go here in the loop?
> 
> Is the system under stress during the failure? What does /bin/sh finally
> point
> to?

Normally when new build triggered on openqa the stress should happen on all test cases.
This issue not happen 100%, maybe related with performance issue.
Such as following result is good:
https://openqa.suse.de/tests/8208919#
Comment 6 Ancor Gonzalez Sosa 2022-02-23 10:46:20 UTC
If this a sporadic problem cause by calling the "/usr/bin/lsattr" command (which libstorage-ng does) and related to Go, keeping it assigned to YaST2 (which is not involved at all in all the above) will not help to move the solution further. So reassigning.
Comment 7 Jan Kara 2022-02-24 09:53:18 UTC
(In reply to Ancor Gonzalez Sosa from comment #6)
> If this a sporadic problem cause by calling the "/usr/bin/lsattr" command
> (which libstorage-ng does) and related to Go, keeping it assigned to YaST2
> (which is not involved at all in all the above) will not help to move the
> solution further. So reassigning.

Yeah, but I'm not the right guy either because as far as I understand lsattr didn't even get executed. The message "runtime/cgo: pthread_create failed: Resource temporarily unavailable" definitely does not come from lsattr(1). Given Arvin's answer I guess it is unexpected runtime/cgo even gets called and so the question we need to answer is IMHO: "How come SystemCmd() from libstorage-ng ended up executing runtime/cgo?"

The failure of runtime/cgo is likely just a consequence of running out of memory inside the test VM (which likely explains the intermittent nature of the failure) but I guess that's a secondary question after we figure out why it even gets called.
Comment 8 Ancor Gonzalez Sosa 2022-02-24 10:40:36 UTC
We have at least three cases of "runtime/cgo: pthread_create failed: Resource temporarily unavailable"".

I'm marking all bugs as duplicate of bug#1196326 to concentrate on a common front.

*** This bug has been marked as a duplicate of bug 1196326 ***
Comment 9 WEI GAO 2022-03-10 06:24:00 UTC
https://openqa.suse.de/tests/8222012 show result is ok