Bugzilla – Bug 1226114
MI300A: rasdaemon: Error logs are not captured in rasdaemon upon error injection
Last modified: 2024-07-19 10:37:20 UTC
This Bug is based in the JIRA https://jira.suse.com/browse/AMD-133 From the analysis, it seems that the SLES15 kernel has incorporated the below two commits: [PATCH] tracing/ring-buffer: Have polling block on watermark (kernel.org) [RFC PATCH 1/1] tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw (kernel.org) While the packaged rasdaemon, as hinted by yghannam, lacks the below commit: rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely · mchehab/rasdaemon@6986d81 · GitHub As a result, the buffer_percent file in tracefs (/sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent) retains its default value of 50. Consequently, the poll() undertaken on per_cpu/cpuX/trace_pipe_raw in tracefs blocks indefinitely, and the rasdaemon does not output decoded error information. Work around: rasdaemon can be used on SLES15-SP5 with the following workaround $ echo 0 > /sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent $ systemctl restart rasdaemon.service .. rasdameon captures logs . attached in AMD-133 ... With this workaround, rasdaemon should log the decoded error information in the journal journalctl -f -u rasdaemon.service Please note that this issue is only prevalent in the packaged version of rasdaemon i.e. 0.6.7 This issue should not be prevalent on the latest version of the rasdaemon i.e. 0.8.0 Based on above, SUSE has to backport the below patch: rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely · mchehab/rasdaemon@6986d81 · GitHub
Hi, The below patch is accepted upstreamed in https://github.com/mchehab/rasdaemon/ ced615c rasdaemon: Add error decoding for MCA_CTL_SMU extended bits Please backport the pending patch mentioned in AMD-133.
Hi, There is a minor enhancement patch for already upstreamed patch in rasdaemon "ced615c rasdaemon: Add error decoding for MCA_CTL_SMU extended bits". and the enhancement patch is 73d8177 rasdaemon: mce-amd-smca: Optimizing decoding of MCA_CTL_SMU bits Could you please merge the below patches ? For polling and capture logs: 6986d81 rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely support New GFX bank error decoding: ced615c rasdaemon: Add error decoding for MCA_CTL_SMU extended bits 73d8177 rasdaemon: mce-amd-smca: Optimizing decoding of MCA_CTL_SMU bits
Added the three patches in https://build.suse.de/request/show/339253.