Bugzilla – Bug 1218923
[Build 134.1] segfault in gnome-session-b on ppc64le
Last modified: 2024-01-19 11:16:56 UTC
* Platform and arch: ppc64le (KVM) * OS Version: SLES for SAP 15-SP5 QU2 RC1 - Build 134.1 - Installed via: ISO - Using manual procedure via openQA job (https://openqa.suse.de/tests/13272276) * LOGS to be attached: - y2logs - journal - dmesg - rpm -qa output - os-autoinst-distri-opensuse problem_detection_logs tarball - /etc/sysconfig tarball - Xlogs.system.log * Results * Expected: no segfaults are detected in the logs * Real: a gnome-session-b segfault is detected in the journal: /var/log/messages:2024-01-17T09:54:13.750058-05:00 localhost kernel: [ 83.745362][ T2921] gnome-session-b[2921]: segfault (11) at aaaaaaaaaaaaaaaa nip 11501df50 lr 11501df3c code 3 in gnome-session-binary[114ff0000+70000] /var/log/messages-2024-01-17T09:54:13.750065-05:00 localhost kernel: [ 83.745382][ T2921] gnome-session-b[2921]: code: fbe1fff8 60000000 7c7f1b78 ebc29522 f8010010 f821ffd1 4bfff0d1 2fbf0000 /var/log/messages-2024-01-17T09:54:13.750067-05:00 localhost kernel: [ 83.745387][ T2921] gnome-session-b[2921]: code: 419e0050 e93f0000 2fa90000 419e0010 <e9290000> 7fa34800 419e001c 7c641b78 More details at: https://openqa.suse.de/tests/13272287#step/check_logs/51 * Reproducible: yes * Way to reproduce it: - Install SLES for SAP from the 15-SP5 QU2 RC1 media - Register the system during installation to SCC and confirm the modules Basesystem, Server Applications, Desktop Applications, High Availability Extension and Python3 Module are enabled during installation. - Select the SLES for SAP Applications system role - Enable Remote Desktop Protocol service during installation - Boot into gnome - Shutdown and reboot * openQA * Link to failed test: https://openqa.suse.de/tests/13272287#step/check_logs/51 * Link to last successful run: https://openqa.suse.de/tests/11163396 (non QU media. GM result) * Description: as described above, we're detecting in the HanaSR test (https://openqa.suse.de/tests/13272287#step/check_logs/51) and in the NetWeaver Cluster test (https://openqa.suse.de/tests/13272284#step/check_logs/47) a gnome segfault. Messages are exactly the same - down to the time and date - in both tests, which means they must have occurred during the installation job (https://openqa.suse.de/tests/13272276) which is shared by both scenarios. HanaSR and NetWeaver tests themselves show no issues, but at the end of the test we have a test module scheduled which checks for segfaults in the logs, and this is where we're seeing this. Issue could be reproduced by restarting all the tests from the installation onwards. Previous results were: NetWeaver Scenario: https://openqa.suse.de/tests/13271887#step/check_logs/47 HANA Scenario: https://openqa.suse.de/tests/13271884#step/check_logs/51 Again, errors are the same, but the previous scenarios have an earlier timestamp. Sadly, we have no previous ppc64le job in these scenario when these tests were working due to different infra issues, so it's possible that this is not a new issue but an existing one. However tests were passing in 15-SP5 GM. ## Reproducible Fails since (at least) Build [115.1](https://openqa.suse.de/tests/11949893) ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online-QR-SAP&machine=ppc64le-sap-qam&test=sles4sap_hana_node01&version=15-SP5)
Created attachment 871953 [details] dmesg
Created attachment 871954 [details] journal
Created attachment 871955 [details] Packages List
Created attachment 871956 [details] /etc/sysconfig tarball
Created attachment 871957 [details] Xlogs.system.log
Created attachment 871958 [details] y2logs
Created attachment 871959 [details] os-autoinst-distri-opensuse problem detection logs tarball
See hardware requirements (for the job that created the HDD): https://documentation.suse.com/sles/15-SP5/single-html/SLES-deployment/#sec-x86-requirements
> /var/log/messages:2024-01-17T09:54:13.750058-05:00 localhost kernel: [ 83.745362][ T2921] gnome-session-b[2921]: segfault (11) at aaaaaaaaaaaaaaaa nip 11501df50 lr 11501df3c code 3 in gnome-session-binary[114ff0000+70000] The time of this message is 2024-01-17T09:54:13, in the pic https://openqa.suse.de/tests/13272287#step/check_logs/1, we can find the time is Jan 17 11:49:58 2024, so this error message is in the previous boot, not in this boot. In the journal log of this boot, I didn't find the segfault.
(In reply to Santiago Zarate from comment #8) > See hardware requirements (for the job that created the HDD): > https://documentation.suse.com/sles/15-SP5/single-html/SLES-deployment/#sec- > x86-requirements I restarted the create_hdd job with 4G RAM, and after it was done, restarted the NetWeaver Cluster and HanaSR jobs. No segfaults were observed there. I think this can be closed as invalid. Sorry for the noise.
segfaults are not present when the system is installed on a VM with 4G RAM, and the cluster jobs run with 32G RAM.