Bug 1212698

Summary: gcc-c++-13 LTO introduces non-determinism
Product: [openSUSE] openSUSE Tumbleweed Reporter: Bernhard Wiedemann <bwiedemann>
Component: DevelopmentAssignee: Richard Biener <rguenther>
Status: REOPENED --- QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: bwiedemann, matz, rguenther, tdevries
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: All   
Whiteboard:
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: binaries

Description Bernhard Wiedemann 2023-06-26 06:49:44 UTC
While working on reproducible builds for openSUSE, I found that
our photoqt package varies between -j1 and -j2 builds since at least 2023-04-06
but only if LTO is enabled.

The diff looks thus:

--- hexdump -C RPMS.1/usr/bin/photoqt
+++ hexdump -C RPMS.2/usr/bin/photoqt
@@ -54,8 +54,8 @@
 00000350  01 00 00 00 00 00 00 00  01 00 01 c0 04 00 00 00  |................|
 00000360  09 00 00 00 00 00 00 00  02 00 01 c0 04 00 00 00  |................|
 00000370  01 00 00 00 00 00 00 00  04 00 00 00 14 00 00 00  |................|
-00000380  03 00 00 00 47 4e 55 00  66 1b d2 9c f6 1c db 53  |....GNU.f......S|
-00000390  37 de a6 7c 4a b1 00 6e  d3 87 9e b3 04 00 00 00  |7..|J..n........|
+00000380  03 00 00 00 47 4e 55 00  1c 48 9e e0 4c cb 55 03  |....GNU..H..L.U.|
+00000390  0f ee 30 57 67 00 ca cb  a5 db 33 43 04 00 00 00  |..0Wg.....3C....|
 000003a0  10 00 00 00 01 00 00 00  47 4e 55 00 00 00 00 00  |........GNU.....|
 000003b0  03 00 00 00 02 00 00 00  00 00 00 00 00 00 00 00  |................|
 000003c0  07 04 00 00 21 04 00 00  41 03 00 00 00 00 00 00  |....!...A.......|
Comment 1 Richard Biener 2023-06-26 11:40:44 UTC
do you still have the two binaries and can you attach them?
Comment 2 Richard Biener 2023-06-26 11:51:56 UTC
I can't reproduce after editing the .spec file, replacing the make line with
make -j1 and make -j2 the binaries are identical.
Comment 3 Bernhard Wiedemann 2023-06-28 19:58:21 UTC
The difference is with osc build -j1 and -j2 : so 1-core-VM vs 2-core-VM which influences scheduling of forked processes / threads.
You might be able to replicate it with taskset 1 vs 3.

Here is the full reproducer script:
#!/bin/sh
osc co openSUSE:Factory/photoqt && cd $_
for N in 1 2 ; do
    osc build --vm-type=kvm --noservice -j $N --keep-pkg=RPMS.$N
    unrpm RPMS.$N/photoqt-*.x86_64.rpm
    hexdump -C usr/bin/photoqt > $N.strings
done
diff -u {1,2}.strings


and I noticed more diff:
 00548030  08 6a 54 00 00 00 00 00  00 00 00 00 00 00 00 00  |.jT.............|
 00548040  70 68 6f 74 6f 71 74 2e  64 65 62 75 67 00 00 00  |photoqt.debug...|
-00548050  76 67 8b 32 00 2e 73 68  73 74 72 74 61 62 00 2e  |vg.2..shstrtab..|
+00548050  cf 99 c4 a7 00 2e 73 68  73 74 72 74 61 62 00 2e  |......shstrtab..|
 00548060  69 6e 74 65 72 70 00 2e  6e 6f 74 65 2e 67 6e 75  |interp..note.gnu|
 00548070  2e 70 72 6f 70 65 72 74  79 00 2e 6e 6f 74 65 2e  |.property..note.|
 00548080  67 6e 75 2e 62 75 69 6c  64 2d 69 64 00 2e 6e 6f  |gnu.build-id..no|
Comment 4 Bernhard Wiedemann 2023-07-05 10:22:57 UTC
Interestingly, with the recent update to photoqt-3.3, I can no longer reproduce this issue.
Comment 5 Chenzi Cao 2023-07-06 14:38:44 UTC
Based on comment#4, I close this bug report, please feel free to reopen it whenever necessary, thanks.
Comment 6 Bernhard Wiedemann 2023-07-27 04:40:47 UTC
I found a similar case now with python-mpi4py
that would produce variations in debuginfo
unless both builds were performed in a 1-core-VM
or lto was disabled.


call trace looks thus:
/home/abuild/rpmbuild/BUILD/mpi4py-3.1.4/build/lib.linux-x86_64-cpython-39/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so written per open
    by pid=1951 dir=/home/abuild/rpmbuild/BUILD/mpi4py-3.1.4 exec="/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld", ["/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld", "-plugin", "/usr/lib64/gcc/x86_64-suse-linux/13/liblto_plugin.so", "-plugin-opt=/usr/lib64/gcc/x86_64-suse-linux/13/lto-wrapper", "-plugin-opt=-fresolution=/tmp/cce7PdEe.res", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_s", "-plugin-opt=-pass-through=-lc", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_s", "--build-id", "--eh-frame-hdr", "-m", "elf_x86_64", "-shared", "-o", "build/lib.linux-x86_64-cpython-39/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so", "/usr/lib64/gcc/x86_64-suse-linux/13/../../../../lib64/crti.o", "/usr/lib64/gcc/x86_64-suse-linux/13/crtbeginS.o", "-Lbuild/temp.linux-x86_64-cpython-39", "-L/usr/lib64/mpi/gcc/openmpi4/lib64", "-L/usr/lib64/gcc/x86_64-suse-linux/13", "-L/usr/lib64/gcc/x86_64-suse-linux/13/../../../../lib64", "-L/lib/../lib64", "-L/usr/lib/../lib64", "-L/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/lib", "-L/usr/lib64/gcc/x86_64-suse-linux/13/../../..", "build/temp.linux-x86_64-cpython-39/src/MPI.o", "-ldl", "-lmpi", "-lgcc", "--push-state", "--as-needed", "-lgcc_s", "--pop-state", "-lc", "-lgcc", "--push-state", "--as-needed", "-lgcc_s", "--pop-state", "/usr/lib64/gcc/x86_64-suse-linux/13/crtendS.o", "/usr/lib64/gcc/x86_64-suse-linux/13/../../../../lib64/crtn.o"] - started
    by pid=1950 dir=/home/abuild/rpmbuild/BUILD/mpi4py-3.1.4 exec="/usr/lib64/gcc/x86_64-suse-linux/13/collect2", ["/usr/lib64/gcc/x86_64-suse-linux/13/collect2", "-plugin", "/usr/lib64/gcc/x86_64-suse-linux/13/liblto_plugin.so", "-plugin-opt=/usr/lib64/gcc/x86_64-suse-linux/13/lto-wrapper", "-plugin-opt=-fresolution=/tmp/cce7PdEe.res", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_s", "-plugin-opt=-pass-through=-lc", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_s", "-flto=auto", "--build-id", "--eh-frame-hdr", "-m", "elf_x86_64", "-shared", "-o", "build/lib.linux-x86_64-cpython-39/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so", "/usr/lib64/gcc/x86_64-suse-linux/13/../../../../lib64/crti.o", "/usr/lib64/gcc/x86_64-suse-linux/13/crtbeginS.o", "-Lbuild/temp.linux-x86_64-cpython-39", "-L/usr/lib64/mpi/gcc/openmpi4/lib64", "-L/usr/lib64/gcc/x86_64-suse-linux/13", "-L/usr/lib64/gcc/x86_64-suse-linux/13/../../../../lib64", "-L/lib/../lib64", "-L/usr/lib/../lib64", "-L/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/lib", "-L/usr/lib64/gcc/x86_64-suse-linux/13/../../..", "build/temp.linux-x86_64-cpython-39/src/MPI.o", "-ldl", "-lmpi", "-lgcc", "--push-state", "--as-needed", "-lgcc_s", "--pop-state", "-lc", "-lgcc", "--push-state", "--as-needed", "-lgcc_s", "--pop-state", "/usr/lib64/gcc/x86_64-suse-linux/13/crtendS.o", "/usr/lib64/gcc/x86_64-suse-linux/13/../../../../lib64/crtn.o"] - started
    by pid=1949 dir=/home/abuild/rpmbuild/BUILD/mpi4py-3.1.4 exec="/usr/bin/gcc", ["/usr/bin/gcc", "-shared", "-O2", "-Wall", "-U_FORTIFY_SOURCE", "-D_FORTIFY_SOURCE=3", "-fstack-protector-strong", "-funwind-tables", "-fasynchronous-unwind-tables", "-fstack-clash-protection", "-Werror=return-type", "-flto=auto", "-fno-strict-aliasing", "build/temp.linux-x86_64-cpython-39/src/MPI.o", "-Lbuild/temp.linux-x86_64-cpython-39", "-ldl", "-o", "build/lib.linux-x86_64-cpython-39/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so", "-I/usr/lib64/mpi/gcc/openmpi4/include", "-L/usr/lib64/mpi/gcc/openmpi4/lib64", "-lmpi"] - started
    by pid=1948 dir=/home/abuild/rpmbuild/BUILD/mpi4py-3.1.4 exec="/usr/lib64/mpi/gcc/openmpi4/bin/mpicc", ["/usr/lib64/mpi/gcc/openmpi4/bin/mpicc", "-shared", "-O2", "-Wall", "-U_FORTIFY_SOURCE", "-D_FORTIFY_SOURCE=3", "-fstack-protector-strong", "-funwind-tables", "-fasynchronous-unwind-tables", "-fstack-clash-protection", "-Werror=return-type", "-flto=auto", "-fno-strict-aliasing", "build/temp.linux-x86_64-cpython-39/src/MPI.o", "-Lbuild/temp.linux-x86_64-cpython-39", "-ldl", "-o", "build/lib.linux-x86_64-cpython-39/mpi4py/MPI.cpython-39-x86_64-linux-gnu.so"] - started
    by pid=1577 dir=/home/abuild/rpmbuild/BUILD/mpi4py-3.1.4 exec="/usr/bin/python3.9", ["/usr/bin/python3.9", "setup.py", "build", "--executable=/usr/bin/python3.9 -s", "--force"]
Comment 7 Richard Biener 2023-07-27 06:37:51 UTC
If you run into this please attach the differing binaries.
Comment 8 Bernhard Wiedemann 2023-07-27 07:34:15 UTC
Created attachment 868443 [details]
binaries
Comment 9 Richard Biener 2023-07-27 09:47:36 UTC
I wonder if this is caused by dwz which also does some parallel processing.  The differences are all in the line number program.

When reproducing the issue is reliable, can you try with blocking dwz
from the build system via
#!BuildIgnore: dwz

?
Comment 10 Bernhard Wiedemann 2023-07-27 15:31:43 UTC
reproduction is reliable. It seems, there is a constant result for every CPU-core-count, so two builds from a 4-core VM are identical, too.

blocking dwz did not make a difference.

I tried to make a smaller reproducer, but had no success there.