Bug 1218365

Summary: [Build 20231221] cmake:full 3.28 fails to build on ppc64le
Product: [openSUSE] openSUSE Tumbleweed Reporter: Dominique Leuenberger <dimstar>
Component: DevelopmentAssignee: Simon Lees <simonf.lees>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: adrian.glaubitz, brad.king, dmueller, marcela.maslanova, martin.liska, mimi.vx
Version: Current   
Target Milestone: ---   
Hardware: PowerPC-64   
OS: Other   
URL: https://openqa.opensuse.org/tests/3832324/modules/libqt5_qtbase/steps/48
Whiteboard:
Found By: openQA Services Priority:
Business Priority: Blocker: Yes
Marketing QA Status: --- IT Deployment: ---

Description Dominique Leuenberger 2023-12-22 15:34:29 UTC
## Observation

https://build.opensuse.org/package/live_build_log/openSUSE:Factory:PowerPC/cmake:full/standard/ppc64le

test suite fails on ppc64le - repeateldy


openQA test in scenario opensuse-Tumbleweed-DVD-ppc64le-extra_tests_on_kde@ppc64le fails in
[libqt5_qtbase](https://openqa.opensuse.org/tests/3832324/modules/libqt5_qtbase/steps/48)

## Test suite description



## Reproducible

Fails since (at least) Build [20231221](https://openqa.opensuse.org/tests/3831092)


## Expected result

Last good: [20231219](https://openqa.opensuse.org/tests/3825238) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=ppc64le&distri=opensuse&flavor=DVD&machine=ppc64le&test=extra_tests_on_kde&version=Tumbleweed)
Comment 1 John Paul Adrian Glaubitz 2024-01-02 23:43:44 UTC
Upstream bug report: https://gitlab.kitware.com/cmake/cmake/-/issues/25500
Comment 2 Simon Lees 2024-01-03 21:47:02 UTC
The cmake bug report is pointing toward libuv, adding those maintainers to cc
Comment 3 Simon Lees 2024-01-05 11:18:51 UTC
*** Bug 1218558 has been marked as a duplicate of this bug. ***
Comment 4 Marcela Maslanova 2024-01-09 08:00:45 UTC
I don't see SR, but it seems to build fine now. Thanks!
Comment 5 Dominique Leuenberger 2024-01-09 10:41:52 UTC
https://build.opensuse.org/package/live_build_log/openSUSE:Factory:PowerPC/cmake:full/standard/ppc64le

I see no indication of this succeeding
Comment 6 Simon Lees 2024-01-10 04:16:37 UTC
I guess, it started succeeding in ALP again, which again points to the fact this is probably being triggered by something other then cmake, it would be nice to know what changed in ALP to fix it.
Comment 7 Simon Lees 2024-01-10 04:40:39 UTC
(In reply to Dominique Leuenberger from comment #0)
> 
> ## Test suite description
> 
> 
> 
> ## Reproducible
> 
> Fails since (at least) Build
> [20231221](https://openqa.opensuse.org/tests/3831092)
> 
> 
> ## Expected result
> 
> Last good: [20231219](https://openqa.opensuse.org/tests/3825238) (or more
> recent)
> 

It seems like libuv was last updated in mid November, is there a chance this only landed in ppc64le sometime later? Looking at the changes around 20231219-20231221 I couldn't see anything too suspicious in the tumbleweed snapshot emails.

Issue has been closed upstream for now as not an issue directly in cmake.
Comment 8 Dominique Leuenberger 2024-01-10 08:01:23 UTC
(In reply to Simon Lees from comment #7)
> (In reply to Dominique Leuenberger from comment #0)
> > 
> > ## Test suite description
> > 
> > 
> > 
> > ## Reproducible
> > 
> > Fails since (at least) Build
> > [20231221](https://openqa.opensuse.org/tests/3831092)
> > 
> > 
> > ## Expected result
> > 
> > Last good: [20231219](https://openqa.opensuse.org/tests/3825238) (or more
> > recent)
> > 
> 
> It seems like libuv was last updated in mid November, is there a chance this
> only landed in ppc64le sometime later? Looking at the changes around
> 20231219-20231221 I couldn't see anything too suspicious in the tumbleweed
> snapshot emails.
> 
> Issue has been closed upstream for now as not an issue directly in cmake.

Well, ALP has cmake 3.27 - which was building (unreliable) on ppc64le.

TW has cmake 3.28 - which reliably fails to build

osc jobhist openSUSE:Factory:PowerPC cmake:full standard ppc64le

On Dec 20, cmake was updated to 3.28.1
r243 | anag+factory | 2023-12-20 20:00:11 | 8db97099e2a7c15974e335873d8790d1 | 3.28.1 | rq1133366
Comment 9 Simon Lees 2024-01-11 10:07:15 UTC
(In reply to Dominique Leuenberger from comment #8)
> (In reply to Simon Lees from comment #7)
> > (In reply to Dominique Leuenberger from comment #0)
> > > 
> > > ## Test suite description
> > > 
> > > 
> > > 
> > > ## Reproducible
> > > 
> > > Fails since (at least) Build
> > > [20231221](https://openqa.opensuse.org/tests/3831092)
> > > 
> > > 
> > > ## Expected result
> > > 
> > > Last good: [20231219](https://openqa.opensuse.org/tests/3825238) (or more
> > > recent)
> > > 
> > 
> > It seems like libuv was last updated in mid November, is there a chance this
> > only landed in ppc64le sometime later? Looking at the changes around
> > 20231219-20231221 I couldn't see anything too suspicious in the tumbleweed
> > snapshot emails.
> > 
> > Issue has been closed upstream for now as not an issue directly in cmake.
> 
> Well, ALP has cmake 3.27 - which was building (unreliable) on ppc64le.
> 
> TW has cmake 3.28 - which reliably fails to build
> 
> osc jobhist openSUSE:Factory:PowerPC cmake:full standard ppc64le
> 
> On Dec 20, cmake was updated to 3.28.1
> r243 | anag+factory | 2023-12-20 20:00:11 | 8db97099e2a7c15974e335873d8790d1
> | 3.28.1 | rq1133366

TW also has libuv 1.47 where as ALP has libuv 1.44, currently cmake upstream believe the issue was introduced in libuv somewhere before 1.46, but so far they haven't been able to replicate the issue in there systems. Hopefully i'll get a chance to look at reproducing it soon but I have a couple of other more urgent issues to look at first.
Comment 10 John Paul Adrian Glaubitz 2024-01-11 10:45:52 UTC
FWIW, in Debian unstable cmake's testsuite passes on ppc64el and ppc64 again so that the build succeeds:

> https://buildd.debian.org/status/package.php?p=cmake&suite=sid

However, the bug seems to be still there since cmake reproducibly segfaults on ppc64 when trying to build LLVM:

> https://buildd.debian.org/status/fetch.php?pkg=llvm-toolchain-17&arch=ppc64&ver=1%3A17.0.6-4&stamp=1704876471&raw=0

I will try to generate a backtrace with gdb and post it here. That should help tracking down the issue.
Comment 11 John Paul Adrian Glaubitz 2024-01-11 14:26:06 UTC
Here is the backtrace which seems to indicate an invalid address was called for the signal handler:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (Thread 0x7fff811f6e60 (LWP 2635470))]
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  <signal handler called>
#2  0x00007fff82eee784 in __GI_epoll_pwait (epfd=4, events=0x7fffd3808cc8, maxevents=1024, timeout=-1, set=0x0) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:40
#3  0x00007fff83545238 in uv__io_poll (loop=0x10015e8edd0, timeout=-1) at ./src/unix/linux.c:1365
#4  0x00007fff8352aa84 in uv_run (loop=0x10015e8edd0, mode=UV_RUN_ONCE) at ./src/unix/core.c:447
#5  0x0000000132669d8c in cmExecuteProcessCommand (args=..., status=...) at ./Source/cmExecuteProcessCommand.cxx:358
#6  0x0000000132561d38 in InvokeBuiltinCommand (status=..., args=..., command=@0x133045248: 0x1326682b0 <cmExecuteProcessCommand(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, cmExecutionStatus&)>)
    at ./Source/cmState.cxx:420
#7  operator() (status=..., args=..., __closure=<optimized out>) at ./Source/cmState.cxx:430
#8  std::__invoke_impl<bool, cmState::AddBuiltinCommand(const std::string&, BuiltinCommand)::<lambda(const std::vector<cmListFileArgument>&, cmExecutionStatus&)>&, const std::vector<cmListFileArgument, std::allocator<cmListFileArgument> >&, cmExecutionStatus&> (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#9  std::__invoke_r<bool, cmState::AddBuiltinCommand(const std::string&, BuiltinCommand)::<lambda(const std::vector<cmListFileArgument>&, cmExecutionStatus&)>&, const std::vector<cmListFileArgument, std::allocator<cmListFileArgument> >&, cmExecutionStatus&> (__fn=...) at /usr/include/c++/13/bits/invoke.h:114
#10 std::_Function_handler<bool(const std::vector<cmListFileArgument, std::allocator<cmListFileArgument> >&, cmExecutionStatus&), cmState::AddBuiltinCommand(const std::string&, BuiltinCommand)::<lambda(const std::vector<cmListFileArgument, std::allocator<cmListFileArgument> >&, cmExecutionStatus&)> >::_M_invoke(const std::_Any_data &, const std::vector<cmListFileArgument, std::allocator<cmListFileArgument> > &, cmExecutionStatus &) (__functor=..., __args#0=..., __args#1=...) at /usr/include/c++/13/bits/std_function.h:290
Comment 12 Brad King 2024-01-11 23:28:59 UTC
I investigated this for the upstream cmake issue:

* https://gitlab.kitware.com/cmake/cmake/-/issues/25500#note_1468188

I reproduced it on a ppc64le virtual machine running Linux kernel 6.6.6.1-default with tumbleweed 20231219.

cmake 3.28.1 with libuv 1.44.2 works, but with libuv 1.47.0 it exhibits the random crashes.

This bisects to a change introduced in libuv 1.45 to add io_uring support:

* https://github.com/libuv/libuv/pull/3952
* https://github.com/libuv/libuv/commit/d2c31f429b87b476a7f1344d145dad4752a406d4

After locally patching libuv's uv__io_uring_setup to return -1, the crashes go away.
Comment 13 Simon Lees 2024-01-12 00:20:45 UTC
I've created a libuv bugreport at https://github.com/libuv/libuv/issues/4283
Comment 14 Brad King 2024-01-13 14:34:55 UTC
I opened a libuv PR to disable io_uring on linux/ppc64[le] pending further investigation, and it has been merged upstream:

* https://github.com/libuv/libuv/pull/4285

The patch could be backported to openSUSE's package(s) for libuv 1.45 and above to resolve this issue.
Comment 15 Simon Lees 2024-01-15 09:35:39 UTC
(In reply to Brad King from comment #14)
> I opened a libuv PR to disable io_uring on linux/ppc64[le] pending further
> investigation, and it has been merged upstream:
> 
> * https://github.com/libuv/libuv/pull/4285
> 
> The patch could be backported to openSUSE's package(s) for libuv 1.45 and
> above to resolve this issue.

Thanks, i'll do that.
Comment 16 OBSbugzilla Bot 2024-01-15 09:45:01 UTC
This is an autogenerated message for OBS integration:
This bug (1218365) was mentioned in
https://build.opensuse.org/request/show/1138765 Factory / libuv
Comment 17 Dominique Leuenberger 2024-03-14 12:16:02 UTC
(In reply to OBSbugzilla Bot from comment #16)
> This is an autogenerated message for OBS integration:
> This bug (1218365) was mentioned in
> https://build.opensuse.org/request/show/1138765 Factory / libuv

was merged a while ago and cmake has not shown as issue on ppc64le since ethen