Bugzilla – Bug 1224464
[Build 91.1] system gets stuck and failes to collect kdump core after trigger crash
Last modified: 2024-06-14 17:03:39 UTC
Created attachment 874963 [details] serial logs ## The issue is similar with https://bugzilla.suse.com/show_bug.cgi?id=1218180, but I am not sure if it is a regression bug on build 91.1 So far, the issue can only be seen on ppc64le platform. Kernel: "6.4.0-150600.21-default” Memory: 4gb/8gb kdump memory: 1gb Steps to reproduce the issue >1. Enalbe kdump service with kdump memory=1024m >2. trigger system crash "echo c > /proc/sysrq-trigger" Expected result: system can collect crash dump file and reboot Actuall result: system hangs and fails to reboot Please refer to the attached file for last screen shot and full serial logs. please feel free to let me know if dev needs to access my setup. ## openQA Observation [automation tests] openQA test in scenario sle-15-SP6-Online-ppc64le-toolchain_zypper@ppc64le fails in [kdump_and_crash](https://openqa.suse.de/tests/14365468/modules/kdump_and_crash/steps/83) ## Test suite description Maintainer: QE Core, mnowak Install toolchain packages and test the toolchain. Uses a more powerful machine configuration. ## Reproducible Fails since (at least) Build [91.1](https://openqa.suse.de/tests/14338091) ## Expected result Last good: [90.1](https://openqa.suse.de/tests/14305219) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online&machine=ppc64le&test=toolchain_zypper&version=15-SP6)
Created attachment 874964 [details] screen shot
I can catch more console logs if I wait more than 5 minutes. please see attached file
Created attachment 874967 [details] screen shot after 5 minutes
(In reply to Richard Fan from comment #3) > Created attachment 874967 [details] > screen shot after 5 minutes (In reply to Richard Fan from comment #0) > Created attachment 874963 [details] > serial logs > > ## The issue is similar with > https://bugzilla.suse.com/show_bug.cgi?id=1218180, but I am not sure if it > is a regression bug on build 91.1 > > So far, the issue can only be seen on ppc64le platform. > > Kernel: "6.4.0-150600.21-default” > Memory: 4gb/8gb > kdump memory: 1gb > > Steps to reproduce the issue > > >1. Enalbe kdump service with kdump memory=1024m > >2. trigger system crash "echo c > /proc/sysrq-trigger" > > Expected result: > system can collect crash dump file and reboot > > Actuall result: > > system hangs and fails to reboot > > Please refer to the attached file for last screen shot and full serial logs. > > please feel free to let me know if dev needs to access my setup. > This is consistently failing (even for 16GB of ram https://openqa.suse.de/tests/14415084#step/kdump_and_crash/83), but is passing for kernel team
Ok so this seems to be: again bsc#1161421 - Passes https://openqa.suse.de/tests/14415136 - CRASH_MEMORY 2048 - Fails https://openqa.suse.de/tests/14415419 - CRASH_MEMORY 1200
@Santiago Zarate, The reason why kernel tests are passed should be only 1 VCPU is assigned, see job setting 'QEMUCPUS=1', with 4 VCPUS assigned, the issue can be reproduced now. >http://openqa.suse.de/tests/overview?build=rfan0523_kernel&distri=sle&version=15-SP6 While if we set 'QEMUCPUS=1' for qe-core tests, the issue is gone >http://openqa.suse.de/tests/overview?version=15-SP6&build=rfan0523&distri=sle ---------------- Do we need need more crash dump memory if more VCPUs assigned to VM in this case?
This is strange. Kdump does require a little more reserved memory whet more CPUS are configured for the kdump envoronment (using KDUMP_CPUS in /etc/sysconfig/kdump). But that the default (and I checked it has not been changed in your qemu image) is KDUMP_CPUS=1. So is the statement about CRASH_MEMORY 2048 passing and CRASH_MEMORY 1200 from Comment #5 correct? I tried with your qemu image (on a x86-64, using qemu full emulation) and it works for me with either 1 or 4 cpus, with the default crashkernel=460M. Is there way I could to manually play with a failing system?