Bugzilla – Bug 1221906
[15SP6][HA] openQA test fails in pacemaker_cts_cluster_exerciser: cts-lab hangs due to issue of `journalctl --until`
Last modified: 2024-07-12 09:50:07 UTC
- Environment Details: SLE15 SP6 recent builds, all arches - Summary: The openQA test cases "ha_pacemaker_cts_client" test module "pacemaker_cts_cluster_exerciser" timed out on command "cts-lab", it hangs due to systemd about journalctl handling logs. - Reproduce steps: 1. Set up the test env. Test cases are in openQA I can help to setup the envs as https://openqa.suse.de/tests/13825589#next_previous did 2. Login to the "ha_pacemaker_cts_client". 3. Run cts-lab as following: '/usr/share/pacemaker/tests/cts-lab --nodes 'pacemaker-node01 pacemaker-node02' --stonith-type external/sbd --stonith-args pcmk_delay_max=30,pcmk_off_action=reboot,action=reboot --test-ip-base 10.0.2.20 --no-loop-tests --no-unsafe-tests --at-boot 1 --outputfile /tmp/cts_cluster_exerciser.log --once' 4. "ctl-lab" hangs, see attached pic 5. Some feedback from Yan Gao FYI: We are definitely suffering from a fresh issue with systemd about journalctl handling logs here, which i believe is about this: https://github.com/systemd/systemd/issues/31776 Looks like it has been just fixed by this https://github.com/systemd/systemd/pull/31861
Created attachment 873762 [details] screen shot for 'cts-lab'
(In reply to lili zhao from comment #0) > 5. Some feedback from Yan Gao FYI: > We are definitely suffering from a fresh issue with systemd about > journalctl handling logs here, which i believe is about this: > https://github.com/systemd/systemd/issues/31776 > Looks like it has been just fixed by this > https://github.com/systemd/systemd/pull/31861 To clarify, it's not journalctl itself that's hanging logs. It's cts-lab that is waiting for completion of collecting logs that are older than a specified timestamp, but the collecting never completes. Apparently `journalctl --until` no long works correctly together with the options `--after-cursor` and `--lines`. With these options supplied all together, journalctl keeps giving the logs that are newer than the specified `--until` timestamp.
Hi, (In reply to lili zhao from comment #0) > We are definitely suffering from a fresh issue with systemd about > journalctl handling logs here, which i believe is about this: > https://github.com/systemd/systemd/issues/31776 > Looks like it has been just fixed by this > https://github.com/systemd/systemd/pull/31861 Would it possible to try a test package which would include the backport of this PR ?
Hi Franck Bui, I can help to create the openQA test case envs for debugging, but could you please tell me how to make the fix take effect on VM (for example, how to do make/patch/...)?
Hi Lili, (In reply to lili zhao from comment #4) > I can help to create the openQA test case envs for debugging, but could you > please tell me how to make the fix take effect on VM (for example, how to do > make/patch/...)? The idea is that I'll provide you a TEST package on IBS including the backport of the suggested fix that you'll use in your openQA test case to check if it's effectively fix the reported issue.
(In reply to Franck Bui from comment #5) > Hi Lili, > > (In reply to lili zhao from comment #4) > > I can help to create the openQA test case envs for debugging, but could you > > please tell me how to make the fix take effect on VM (for example, how to do > > make/patch/...)? > > The idea is that I'll provide you a TEST package on IBS including the > backport of the suggested fix that you'll use in your openQA test case to > check if it's effectively fix the reported issue. Got it, thanks!
Lili, you can find the test package by following this url: https://build.suse.de/package/show/home:fbui:systemd:SLE-15-SP6-bsc1221906/systemd Can you please give it a try ?
(In reply to Franck Bui from comment #7) > Lili, you can find the test package by following this url: > https://build.suse.de/package/show/home:fbui:systemd:SLE-15-SP6-bsc1221906/ > systemd > > Can you please give it a try ? Thanks for the pkg, test cases passed with the new repo: https://openqa.suse.de/tests/13897424#step/pacemaker_cts_cluster_exerciser/33 (test results, or see attachment)
Created attachment 873884 [details] pacemaker_cts_cluster_exerciser test results (passed)
Thanks for testing Lili, I'll queue the patch for the next SP6 snapshot.
Updates: This issue still exists on latest build Public RC candidate 82.1: https://openqa.suse.de/tests/14107506#step/pacemaker_cts_cluster_exerciser/23
Did you check that the version of systemd contains the fix ?
I did not find the code change related descriptions in the output of "# rpm -qi --changelog systemd | grep journalctl" and "# rpm -qi --changelog systemd | grep bsc | grep 1221906", see attached screenshot for details.
Created attachment 874433 [details] output of rpm-qi-changelog
Do you have any update on this bug, @Frank?
Sorry I thought that the fix was already submitted but that appeared to be wrong. I now submitted it via sr#328293. Please give it a bit of time before it becomes available.
Update: Test cases passed on latest build 15 SP6 88.1. Thanks for the fix. x86: https://openqa.suse.de/tests/14254651#step/pacemaker_cts_cluster_exerciser/23 ppc64le: https://openqa.suse.de/tests/14254743#step/pacemaker_cts_cluster_exerciser/23 aarch64: https://openqa.suse.de/tests/14254766#step/pacemaker_cts_cluster_exerciser/23
This is an autogenerated message for OBS integration: This bug (1221906) was mentioned in https://build.opensuse.org/request/show/1172983 Factory / systemd
Thanks Franck for the fix, so we can move the status forward now.
Thanks for the positive status, I'll close the bug then.
Verified fixed.