Bugzilla – Bug 1217456
normalize log format from RAs with one of Pacemaker
Last modified: 2024-06-11 10:42:24 UTC
For years RAs log format is different from pacemaker itself, thus it makes pretty hard to sort lines. An example: ---%>--- IPaddr2(p_ip_sllqsaph18ht)[12322]: 2023/11/23_06:27:08 INFO: Adding inet address 10.6.1.38/21 with broadcast address 10.6.7.255 to device eth0 (with label eth0:h18ht) ^^ RA example in pacemaker.log Nov 22 21:29:18.765 slvqsaph18sn1 pacemaker-controld [42456] (do_state_transition) notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ^^ pacemaker component in pacemaker.log ---%<--- Upstream pushed a change https://github.com/ClusterLabs/resource-agents/pull/936; but clearly this already caused some compatibility discussion (see the PR discussion). We reverted such change in https://build.suse.de/package/view_file/SUSE:SLE-15-SP5:Update/resource-agents/0006-Revert-ocf_log-use-same-log-format-as-pacemaker.patch?expand=1 . Could we after the years do something with that? I'm opening this one so it gets into the radar of maintainer of RA and other people; so that we can see if it's fine to consolidate it with the upstream now...
Plus, RAs don't use miliseconds (?), thus another difference.
Thanks for bringing this up, Jiri. It sounds sensible if this could comply with pacemaker's convention and also be consistent the resource-agents upstream. Fabian/Lars, does the logging format of RAs matter for any SAP tools/consumers? Xin, does it matter for crmsh, for example the "history" feature and so on?
(In reply to Yan Gao from comment #2) > Thanks for bringing this up, Jiri. > > It sounds sensible if this could comply with pacemaker's convention and also > be consistent the resource-agents upstream. > > Fabian/Lars, does the logging format of RAs matter for any SAP > tools/consumers? > > Xin, does it matter for crmsh, for example the "history" feature and so on? Hi Yan et al, 1. We are talking about the RA logging into pacemaker.log? We are not talking about the RA logging into syslog messages? 2. I do not know exactly what specific tools customers are using. I am quite sure there are tools and selfmade scripts for scanning at least syslog messages. We might involve our alliances managers to ask hyperscalers and system integrators. 3. What is our own QA doing with logs? 4. For the SAPHanaSR RAs we use the same time format as syslog, including milliseconds. This is useful for correlating RA events with the overall system. The root cause usually comes from outside the pacemaker cluster. From that perspective it might be an idea to normalize pacemaker with the rest of the logging facilities. 5. Sidenote: I personally prefer the general syslog timestamp format. It is more easy to parse. The syslog timestamp appears more logical to me than the pacemaker one. The papcemaker format is like US (Nov 24 00:00:01.113), syslog is like rest of the world and more precise (2023-11-24T08:58:43.063675+01:00). 6. In theory OCF RAs are not bound to pacemaker. They have been designed at heartbeat2 times and later been used with openais. I do not know if they are used on other cluster backends nowadays. Just my 2 ct. Regards, Lars
Hi Lars, (In reply to Lars Pinne from comment #4) > 1. We are talking about the RA logging into pacemaker.log? Yes, it's only about logging into pacemaker.log. > We are not talking about the RA logging into syslog messages? Right. The RA logging format in syslog has always been the same, which follows the convention there. For example: 2023-11-24T09:22:56.594816+01:00 sle15sp5-1 IPaddr2(r1005)[2559]: INFO: Bringing device eth0 up > 4. For the SAPHanaSR RAs we use the same time format as syslog, including > milliseconds. This is useful for correlating RA events with the overall > system. The root cause usually comes from outside the pacemaker cluster. > From that perspective it might be an idea to normalize pacemaker with the > rest of the logging facilities. > > 5. Sidenote: I personally prefer the general syslog timestamp format. It is > more > easy to parse. The syslog timestamp appears more logical to me than the > pacemaker one. The papcemaker format is like US > (Nov 24 00:00:01.113), syslog is like rest of the world and more precise > (2023-11-24T08:58:43.063675+01:00). It's indeed very diverse. It can be very different even for syslog itself. For instance what results into /var/log/messages is determined by syslogd/rsyslog (or even user-defined formats) and the default formats of them have been different over the years. While journalctl has its own default format which is different from rsyslog default ... And for the case of pacemaker, it's rather determined/limited by libqb. I guess it would be easier it there was a consensus what could be the commonly agreed standard ... > > 6. In theory OCF RAs are not bound to pacemaker. They have been designed at > heartbeat2 times and later been used with openais. I do not know if they > are used on other cluster backends nowadays. Right, or rather about if they are used by any other cluster resource managers than pacemaker. AFAIK, HAKube :) The fact is, with the SUSE-specific patch for the logging function in resource-agents, we have been inconsistent with upstream for six years. So this is probably only about us and whether we'd like to go with upstream.
Do we have a decision? Shall I remove the corresponding patch? If yes: Which SLES versions are affected?
Peter, IMO, SLE16 will be a good opportunity for us to comply with upstream on this. Not sure if it makes sense to similarly apply the patch only for releases < 16 with something like: %if 0%{?suse_version} < 1600 %patch -P 6 -p1 %endif So that we could maintain a single unified source package for both SLE16 and SLE15 at least for the time being...
This is an autogenerated message for OBS integration: This bug (1217456) was mentioned in https://build.opensuse.org/request/show/1179621 Factory / resource-agents
https://build.suse.de/request/show/334675