Bug 1218923 - [Build 134.1] segfault in gnome-session-b on ppc64le
Summary: [Build 134.1] segfault in gnome-session-b on ppc64le
Status: RESOLVED INVALID
Alias: None
Product: PUBLIC SUSE Linux Enterprise Server 15 SP5
Classification: openSUSE
Component: GNOME (show other bugs)
Version: unspecified
Hardware: PowerPC-64 Other
: P5 - None : Normal
Target Milestone: ---
Assignee: E-mail List
QA Contact:
URL: https://openqa.suse.de/tests/13272287...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-17 18:14 UTC by Alvaro Carvajal
Modified: 2024-01-19 11:16 UTC (History)
2 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg (40.35 KB, text/x-log)
2024-01-17 18:15 UTC, Alvaro Carvajal
Details
journal (525.61 KB, text/x-log)
2024-01-17 18:15 UTC, Alvaro Carvajal
Details
Packages List (73.56 KB, text/plain)
2024-01-17 18:16 UTC, Alvaro Carvajal
Details
/etc/sysconfig tarball (55.22 KB, application/x-bzip)
2024-01-17 18:16 UTC, Alvaro Carvajal
Details
Xlogs.system.log (26.85 KB, text/x-log)
2024-01-17 18:17 UTC, Alvaro Carvajal
Details
y2logs (6.07 MB, application/x-bzip)
2024-01-17 18:17 UTC, Alvaro Carvajal
Details
os-autoinst-distri-opensuse problem detection logs tarball (16.35 KB, application/x-xz)
2024-01-17 18:17 UTC, Alvaro Carvajal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alvaro Carvajal 2024-01-17 18:14:19 UTC
* Platform and arch: ppc64le (KVM)

* OS Version: SLES for SAP 15-SP5 QU2 RC1 - Build 134.1
  - Installed via: ISO
  - Using manual procedure via openQA job (https://openqa.suse.de/tests/13272276)

* LOGS to be attached:
  - y2logs
  - journal
  - dmesg
  - rpm -qa output
  - os-autoinst-distri-opensuse problem_detection_logs tarball
  - /etc/sysconfig tarball
  - Xlogs.system.log

* Results
  * Expected: no segfaults are detected in the logs
  * Real: a gnome-session-b segfault is detected in the journal:

/var/log/messages:2024-01-17T09:54:13.750058-05:00 localhost kernel: [   83.745362][ T2921] gnome-session-b[2921]: segfault (11) at aaaaaaaaaaaaaaaa nip 11501df50 lr 11501df3c code 3 in gnome-session-binary[114ff0000+70000]
/var/log/messages-2024-01-17T09:54:13.750065-05:00 localhost kernel: [   83.745382][ T2921] gnome-session-b[2921]: code: fbe1fff8 60000000 7c7f1b78 ebc29522 f8010010 f821ffd1 4bfff0d1 2fbf0000 
/var/log/messages-2024-01-17T09:54:13.750067-05:00 localhost kernel: [   83.745387][ T2921] gnome-session-b[2921]: code: 419e0050 e93f0000 2fa90000 419e0010 <e9290000> 7fa34800 419e001c 7c641b78 

More details at: https://openqa.suse.de/tests/13272287#step/check_logs/51

* Reproducible: yes

* Way to reproduce it:
  - Install SLES for SAP from the 15-SP5 QU2 RC1 media
  - Register the system during installation to SCC and confirm the modules Basesystem, Server Applications, Desktop Applications, High Availability Extension and Python3 Module are enabled during installation.
  - Select the SLES for SAP Applications system role
  - Enable Remote Desktop Protocol service during installation
  - Boot into gnome
  - Shutdown and reboot

* openQA
  * Link to failed test: https://openqa.suse.de/tests/13272287#step/check_logs/51
  * Link to last successful run: https://openqa.suse.de/tests/11163396 (non QU media. GM result)

* Description: as described above, we're detecting in the HanaSR test (https://openqa.suse.de/tests/13272287#step/check_logs/51) and in the NetWeaver Cluster test (https://openqa.suse.de/tests/13272284#step/check_logs/47) a gnome segfault. Messages are exactly the same - down to the time and date - in both tests, which means they must have occurred during the installation job (https://openqa.suse.de/tests/13272276) which is shared by both scenarios. HanaSR and NetWeaver tests themselves show no issues, but at the end of the test we have a test module scheduled which checks for segfaults in the logs, and this is where we're seeing this. Issue could be reproduced by restarting all the tests from the installation onwards. Previous results were:

NetWeaver Scenario: https://openqa.suse.de/tests/13271887#step/check_logs/47
HANA Scenario: https://openqa.suse.de/tests/13271884#step/check_logs/51

Again, errors are the same, but the previous scenarios have an earlier timestamp.

Sadly, we have no previous ppc64le job in these scenario when these tests were working due to different infra issues, so it's possible that this is not a new issue but an existing one. However tests were passing in 15-SP5 GM.

## Reproducible

Fails since (at least) Build [115.1](https://openqa.suse.de/tests/11949893)

## Further details

Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online-QR-SAP&machine=ppc64le-sap-qam&test=sles4sap_hana_node01&version=15-SP5)
Comment 1 Alvaro Carvajal 2024-01-17 18:15:39 UTC
Created attachment 871953 [details]
dmesg
Comment 2 Alvaro Carvajal 2024-01-17 18:15:56 UTC
Created attachment 871954 [details]
journal
Comment 3 Alvaro Carvajal 2024-01-17 18:16:14 UTC
Created attachment 871955 [details]
Packages List
Comment 4 Alvaro Carvajal 2024-01-17 18:16:43 UTC
Created attachment 871956 [details]
/etc/sysconfig tarball
Comment 5 Alvaro Carvajal 2024-01-17 18:17:02 UTC
Created attachment 871957 [details]
Xlogs.system.log
Comment 6 Alvaro Carvajal 2024-01-17 18:17:19 UTC
Created attachment 871958 [details]
y2logs
Comment 7 Alvaro Carvajal 2024-01-17 18:17:45 UTC
Created attachment 871959 [details]
os-autoinst-distri-opensuse problem detection logs tarball
Comment 8 Santiago Zarate 2024-01-17 23:38:43 UTC
See hardware requirements (for the job that created the HDD): https://documentation.suse.com/sles/15-SP5/single-html/SLES-deployment/#sec-x86-requirements
Comment 9 xiaoguang wang 2024-01-18 01:35:27 UTC
> /var/log/messages:2024-01-17T09:54:13.750058-05:00 localhost kernel: [   83.745362][ T2921] gnome-session-b[2921]: segfault (11) at aaaaaaaaaaaaaaaa nip 11501df50 lr 11501df3c code 3 in gnome-session-binary[114ff0000+70000]
The time of this message is 2024-01-17T09:54:13, in the pic https://openqa.suse.de/tests/13272287#step/check_logs/1, we can find the time is Jan 17 11:49:58 2024, so this error message is in the previous boot, not in this boot. In the journal log of this boot, I didn't find the segfault.
Comment 10 Alvaro Carvajal 2024-01-19 11:15:50 UTC
(In reply to Santiago Zarate from comment #8)
> See hardware requirements (for the job that created the HDD):
> https://documentation.suse.com/sles/15-SP5/single-html/SLES-deployment/#sec-
> x86-requirements

I restarted the create_hdd job with 4G RAM, and after it was done, restarted the NetWeaver Cluster and HanaSR jobs. No segfaults were observed there.

I think this can be closed as invalid. Sorry for the noise.
Comment 11 Alvaro Carvajal 2024-01-19 11:16:56 UTC
segfaults are not present when the system is installed on a VM with 4G RAM, and the cluster jobs run with 32G RAM.