Bugzilla – Bug 1223798
Nvidia: Grace Perf-stat tool does not support ipc
Last modified: 2024-05-22 16:42:40 UTC
We would to request to backport these 2 patches: perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo https://github.com/torvalds/linux/commit/4473949074c35072f598bd525ae51d5455f05745 perf parse-events: Make legacy events lower priority than sysfs/JSON https://github.com/torvalds/linux/commit/a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b One way to know if the patches are working is by dooing this command and seeing this output: sudo perf stat -v -a -M ipc ls / Using CPUID 0x00000000410fd4f0 metric expr INST_RETIRED / CPU_CYCLES for ipc found event INST_RETIRED found event CPU_CYCLES Parsing metric events '{INST_RETIRED/metric-id=INST_RETIRED/,CPU_CYCLES/metric-id=CPU_CYCLES/}:W' INST_RETIRED -> armv8_pmuv3_0/metric-id=INST_RETIRED,INST_RETIRED/ CPU_CYCLES -> armv8_pmuv3_0/metric-id=CPU_CYCLES,CPU_CYCLES/ Matched metric-id INST_RETIRED to INST_RETIRED Matched metric-id CPU_CYCLES to CPU_CYCLES Control descriptor is not initialized bin bin.usr-is-merged boot cdrom dev etc home lib lib.usr-is-merged lost+found media mnt opt proc root run sbin sbin.usr-is-merged snap srv swap.img sys tmp usr var INST_RETIRED: 19656830 442994656 442994656 CPU_CYCLES: 33765594 442994656 442994656 Performance counter stats for 'system wide': 19656830 INST_RETIRED # 0.6 per cycle ipc 33765594 CPU_CYCLES 0.002799165 seconds time elapsed We would like this working with SLES 15 SP5 and SP6. When I tried this command at SLES SP5 I got this: # sudo perf stat -v -a -M ipc ls / Using CPUID 0x00000000410fd4f0 Cannot find metric or group `ipc' Usage: perf stat [<options>] [<command>] -M, --metrics <metric/metric group list> monitor specified metrics or metric groups (separated by ,) # rpm -qa | grep perf gperf-3.1-1.27.aarch64 perf-5.14.21-150500.50.44.aarch64 # rpm -qa | grep kernel-64k kernel-64kb-devel-5.14.21-150500.55.39.1.aarch64 kernel-64kb-5.14.21-150500.55.39.1.aarch64
(In reply to Carol Soto from comment #0) > We would to request to backport these 2 patches: > > perf vendor events arm64: Update N2 and V2 metrics and events using Arm > telemetry repo > https://github.com/torvalds/linux/commit/ > 4473949074c35072f598bd525ae51d5455f05745 this is already in SP6 and is not an issue to backport to SP5 > perf parse-events: Make legacy events lower priority than sysfs/JSON > https://github.com/torvalds/linux/commit/ > a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b I am confused. Reading the backing thread for the above commit, the issue was tracked down to 5ea8f2ccffb23983f02012a2731464586b10fbf3 "perf parse-events: Support hardware events as terms" which caused a v6.5->v6.6 regression. This commit is not present in SP5. For SP6 (which is pending release) my concern with this is that it is changing the existing default behavior of the tool. Right now legacy events are always given priority. Now with this patch if a PMU is specified, the legacy event will have lower priority.
(In reply to Tony Jones from comment #1) > > > > (In reply to Carol Soto from comment #0) > > We would to request to backport these 2 patches: > > > > perf vendor events arm64: Update N2 and V2 metrics and events using Arm > > telemetry repo > > https://github.com/torvalds/linux/commit/ > > 4473949074c35072f598bd525ae51d5455f05745 > > this is already in SP6 and is not an issue to backport to SP5 > > > perf parse-events: Make legacy events lower priority than sysfs/JSON > > https://github.com/torvalds/linux/commit/ > > a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b > > I am confused. Reading the backing thread for the above commit, the issue > was tracked down to 5ea8f2ccffb23983f02012a2731464586b10fbf3 "perf > parse-events: Support hardware events as terms" which caused a v6.5->v6.6 > regression. > > This commit is not present in SP5. > > For SP6 (which is pending release) my concern with this is that it is > changing the existing default behavior of the tool. Right now legacy events > are always given priority. Now with this patch if a PMU is specified, the > legacy event will have lower priority. For SP6 maybe I check when the kernel is out with first commit. Is that in beta already or need to wait for release? When I opened the bugzilla I only check SP5. Thanks Carol
(In reply to Carol Soto from comment #2) > (In reply to Tony Jones from comment #1) > > > > > > > > (In reply to Carol Soto from comment #0) > > > We would to request to backport these 2 patches: > > > > > > perf vendor events arm64: Update N2 and V2 metrics and events using Arm > > > telemetry repo > > > https://github.com/torvalds/linux/commit/ > > > 4473949074c35072f598bd525ae51d5455f05745 > > > > this is already in SP6 and is not an issue to backport to SP5 > > > > > perf parse-events: Make legacy events lower priority than sysfs/JSON > > > https://github.com/torvalds/linux/commit/ > > > a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b > > > > I am confused. Reading the backing thread for the above commit, the issue > > was tracked down to 5ea8f2ccffb23983f02012a2731464586b10fbf3 "perf > > parse-events: Support hardware events as terms" which caused a v6.5->v6.6 > > regression. > > > > This commit is not present in SP5. > > > > For SP6 (which is pending release) my concern with this is that it is > > changing the existing default behavior of the tool. Right now legacy events > > are always given priority. Now with this patch if a PMU is specified, the > > legacy event will have lower priority. > > For SP6 maybe I check when the kernel is out with first commit. Is that in > beta already or need to wait for release? When I opened the bugzilla I only > check SP5. > Thanks > Carol $ git log patches.suse/perf-vendor-events-arm64-Update-N2-and-V2-metrics-and-events-using-Arm-telemetry-repo.patch | cat commit cf0943cb836f8b865f9a6e7cb63e1d39b1c42079 Author: Tony Jones <tonyj@suse.de> Date: Sun Jan 14 23:55:52 2024 -0800 perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo (perf-v6.7 (jsc#PED-6012 jsc#PED-6121)). $ grep Git-commit: patches.suse/perf-vendor-events-arm64-Update-N2-and-V2-metrics-and-events-using-Arm-telemetry-repo.patch Git-commit: 4473949074c35072f598bd525ae51d5455f05745 The above fix was, wrt SP6, in Snapshot-202402-1 and also in beta4
(In reply to Tony Jones from comment #3) > (In reply to Carol Soto from comment #2) > > (In reply to Tony Jones from comment #1) > > > > > > > > > > > > (In reply to Carol Soto from comment #0) > > > > We would to request to backport these 2 patches: > > > > > > > > perf vendor events arm64: Update N2 and V2 metrics and events using Arm > > > > telemetry repo > > > > https://github.com/torvalds/linux/commit/ > > > > 4473949074c35072f598bd525ae51d5455f05745 > > > > > > this is already in SP6 and is not an issue to backport to SP5 > > > > > > > perf parse-events: Make legacy events lower priority than sysfs/JSON > > > > https://github.com/torvalds/linux/commit/ > > > > a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b > > > > > > I am confused. Reading the backing thread for the above commit, the issue > > > was tracked down to 5ea8f2ccffb23983f02012a2731464586b10fbf3 "perf > > > parse-events: Support hardware events as terms" which caused a v6.5->v6.6 > > > regression. > > > > > > This commit is not present in SP5. > > > > > > For SP6 (which is pending release) my concern with this is that it is > > > changing the existing default behavior of the tool. Right now legacy events > > > are always given priority. Now with this patch if a PMU is specified, the > > > legacy event will have lower priority. > > > > For SP6 maybe I check when the kernel is out with first commit. Is that in > > beta already or need to wait for release? When I opened the bugzilla I only > > check SP5. > > Thanks > > Carol > > $ git log > patches.suse/perf-vendor-events-arm64-Update-N2-and-V2-metrics-and-events- > using-Arm-telemetry-repo.patch | cat > > commit cf0943cb836f8b865f9a6e7cb63e1d39b1c42079 > Author: Tony Jones <tonyj@suse.de> > Date: Sun Jan 14 23:55:52 2024 -0800 > > perf vendor events arm64: Update N2 and V2 metrics and > events using Arm telemetry repo (perf-v6.7 (jsc#PED-6012 > jsc#PED-6121)). > > $ grep Git-commit: > patches.suse/perf-vendor-events-arm64-Update-N2-and-V2-metrics-and-events- > using-Arm-telemetry-repo.patch > Git-commit: 4473949074c35072f598bd525ae51d5455f05745 > > The above fix was, wrt SP6, in Snapshot-202402-1 and also in beta4 Thanks I check on Monday for a system, give it a try and let you know. Carol
(In reply to Tony Jones from comment #3) > Git-commit: 4473949074c35072f598bd525ae51d5455f05745 > > The above fix was, wrt SP6, in Snapshot-202402-1 and also in beta4 WRT the second patch. What hardware are you using as a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b seems to be focused on fixing an issue on Apple (icestorm) ARM64 systems but in your error output you are referencing a different PMU.
(In reply to Tony Jones from comment #5) > (In reply to Tony Jones from comment #3) > > > Git-commit: 4473949074c35072f598bd525ae51d5455f05745 > > > > The above fix was, wrt SP6, in Snapshot-202402-1 and also in beta4 > > WRT the second patch. > > What hardware are you using as a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b > seems to be focused on fixing an issue on Apple (icestorm) ARM64 systems but > in your error output you are referencing a different PMU. I just tried with this SLES 15 SP6 kernel 6.4.0-150600.9-64kb. The first patch is included so its ok sudo perf stat -v -a -M ipc ls / Using CPUID 0x00000000410fd4f0 metric expr INST_RETIRED / CPU_CYCLES for ipc found event INST_RETIRED found event CPU_CYCLES Parsing metric events '{INST_RETIRED/metric-id=INST_RETIRED/,CPU_CYCLES/metric-id=CPU_CYCLES/}:W' INST_RETIRED -> armv8_pmuv3_0/metric-id=INST_RETIRED,INST_RETIRED/ CPU_CYCLES -> armv8_pmuv3_0/metric-id=CPU_CYCLES,CPU_CYCLES/ Matched metric-id INST_RETIRED to INST_RETIRED Matched metric-id CPU_CYCLES to CPU_CYCLES Control descriptor is not initialized bin boot dev etc home lib lib64 mnt opt proc root run sbin selinux srv sys tmp usr var INST_RETIRED: 13383342 391883968 391883968 CPU_CYCLES: 31992934 391883968 391883968 Performance counter stats for 'system wide': 13,383,342 INST_RETIRED # 0.4 per cycle ipc 31,992,934 CPU_CYCLES 0.002505120 seconds time elapsed The second patch maybe is like you said we donot need it. I can not see the issue, this command ran ok. sudo taskset -c 0 perf stat -e armv8_pmuv3_0/cycles/ -e armv8_pmuv3_0/cycles/ -e cycles ls bin Performance counter stats for 'ls': 1,778,300 armv8_pmuv3_0/cycles/ 1,778,292 armv8_pmuv3_0/cycles/ 1,778,292 cycles 0.000854048 seconds time elapsed 0.000854000 seconds user 0.000000000 seconds sys Thanks Carol
(In reply to Carol Soto from comment #7) > (In reply to Tony Jones from comment #5) > > (In reply to Tony Jones from comment #3) > > > > > Git-commit: 4473949074c35072f598bd525ae51d5455f05745 > > > > > > The above fix was, wrt SP6, in Snapshot-202402-1 and also in beta4 > > > > WRT the second patch. > > > > What hardware are you using as a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b > > seems to be focused on fixing an issue on Apple (icestorm) ARM64 systems but > > in your error output you are referencing a different PMU. > > I just tried with this SLES 15 SP6 kernel 6.4.0-150600.9-64kb. The version of the perf userspace tool (rpm -q perf) is more important than the kernel version in terms of whether the fix (the first) is present. > The second patch maybe is like you said we donot need it. I can not see the > issue, this command ran ok. > sudo taskset -c 0 perf stat -e armv8_pmuv3_0/cycles/ -e > armv8_pmuv3_0/cycles/ -e cycles ls > bin So is there an issue with SP6. Sorry from your reply I cannot tell. Do you need 4473949074c3 ackporting to SP5? Please note that SP5+4473949074c3 is very different from what is in SP6 so for this specific hardware it's unknown whether the above would be sufficient.
(In reply to Tony Jones from comment #8) > (In reply to Carol Soto from comment #7) > > (In reply to Tony Jones from comment #5) > > > (In reply to Tony Jones from comment #3) > > > > > > > Git-commit: 4473949074c35072f598bd525ae51d5455f05745 > > > > > > > > The above fix was, wrt SP6, in Snapshot-202402-1 and also in beta4 > > > > > > WRT the second patch. > > > > > > What hardware are you using as a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b > > > seems to be focused on fixing an issue on Apple (icestorm) ARM64 systems but > > > in your error output you are referencing a different PMU. > > > > I just tried with this SLES 15 SP6 kernel 6.4.0-150600.9-64kb. > > The version of the perf userspace tool (rpm -q perf) is more important than > the kernel version in terms of whether the fix (the first) is present. This is the perf version that I tried with SLES 15 SP6. rpm -qa | grep perf perf-6.4.0.git18573.c37d66c4fd-150600.1.1.aarch64 > > > The second patch maybe is like you said we donot need it. I can not see the > > issue, this command ran ok. > > sudo taskset -c 0 perf stat -e armv8_pmuv3_0/cycles/ -e > > armv8_pmuv3_0/cycles/ -e cycles ls > > bin > > So is there an issue with SP6. Sorry from your reply I cannot tell. > > Do you need 4473949074c3 ackporting to SP5? Please note that > SP5+4473949074c3 > is very different from what is in SP6 so for this specific hardware it's > unknown whether the above would be sufficient. Yes just backporting 4473949074c3 backporting to SP5. thanks Carol
(In reply to Carol Soto from comment #9) > This is the perf version that I tried with SLES 15 SP6. > rpm -qa | grep perf > perf-6.4.0.git18573.c37d66c4fd-150600.1.1.aarch64 That will be fine. > Yes just backporting 4473949074c3 backporting to SP5. I will prepare a test package for you to try.
(In reply to Carol Soto from comment #9) > Yes just backporting 4473949074c3 backporting to SP5. As I alluded to in comment 8, there is more to this than just backporting this commit. There is no json support for arm/neoverse-n2-v2 in SP5.
(In reply to Tony Jones from comment #11) > (In reply to Carol Soto from comment #9) > > Yes just backporting 4473949074c3 backporting to SP5. > > As I alluded to in comment 8, there is more to this than just backporting > this commit. There is no json support for arm/neoverse-n2-v2 in SP5. Yeah I noticed that when I run the command sudo perf stat -v -a -M ipc ls / at sles15 sp5 the command is missing commits. Will still possible to make this command work with SLES 15 SP5? Thanks Carol
(In reply to Carol Soto from comment #12) > Will still possible to make > this command work with SLES 15 SP5? It depends on what changes are needed. I will try to borrow our matching hardware and see what is involved. If it is complex you will likely need an ECO. If it's simple it can be handled by this bugzilla.
(In reply to Tony Jones from comment #13) > (In reply to Carol Soto from comment #12) > > Will still possible to make > > this command work with SLES 15 SP5? > > It depends on what changes are needed. I will try to borrow our matching > hardware and see what is involved. If it is complex you will likely need > an ECO. If it's simple it can be handled by this bugzilla. Thanks so much for the info. Carol
I looked at this some more. (In reply to Carol Soto from comment #0) > We would to request to backport these 2 patches: > > perf vendor events arm64: Update N2 and V2 metrics and events using Arm > telemetry repo > https://github.com/torvalds/linux/commit/ > 4473949074c35072f598bd525ae51d5455f05745 As I mentioned, this patch isn't sufficient as SP5 lacks any of the prior support for arm/neoverse-n2-v2. So all it's dependent patches will also be needed. In addition we also need all the base metric support in 'arm64/sbsa.json' namely: a9ff64e5a0421914c6b23e4505d9384b8c745b5a 556fd664d666c0cc9d5b0d52851b0480c51cf59e ab3744007d51420dd63d5323acbe7abbb843ba63 Also needed is the jevent general metric support: 5b51e47a3f1d7619b424b4b89b5d19569a462b09 This general metric support change uses the revised python based json generator whereas SP5 has the previous C based generator. So this would need to be handled also. Bottom line, the scope of this is IMO unsuitable for a bugzilla. We have an ECO process through which significant changes can be requested and SUSE can evaluate
(In reply to Tony Jones from comment #15) > I looked at this some more. > > (In reply to Carol Soto from comment #0) > > We would to request to backport these 2 patches: > > > > perf vendor events arm64: Update N2 and V2 metrics and events using Arm > > telemetry repo > > https://github.com/torvalds/linux/commit/ > > 4473949074c35072f598bd525ae51d5455f05745 > > As I mentioned, this patch isn't sufficient as SP5 lacks any of the prior > support for arm/neoverse-n2-v2. So all it's dependent patches will also be > needed. > > In addition we also need all the base metric support in 'arm64/sbsa.json' > namely: > a9ff64e5a0421914c6b23e4505d9384b8c745b5a > 556fd664d666c0cc9d5b0d52851b0480c51cf59e > ab3744007d51420dd63d5323acbe7abbb843ba63 > > Also needed is the jevent general metric support: > 5b51e47a3f1d7619b424b4b89b5d19569a462b09 > > This general metric support change uses the revised python based json > generator whereas SP5 has the previous C based generator. So this would > need to be handled also. > > Bottom line, the scope of this is IMO unsuitable for a bugzilla. > > We have an ECO process through which significant changes can be requested > and SUSE can evaluate Hi Thanks so much for looking into this. The changes are in SP6, we will communicate to our team if they really want this on SP5 then we will have to the ECO process. Carol
(In reply to Carol Soto from comment #16) > The changes are in SP6 Correct. I verified it is working in SP6. Most of the changes were in v6.3 and SP6 perf is based on v6.7 > communicate to our team if they really want this on SP5 then we will have to > the ECO process. Yes, you will need to provide suitable rationale as to why you need this feature and then we can scope the work and decide. Thanks!
(In reply to Tony Jones from comment #17) > (In reply to Carol Soto from comment #16) > > > The changes are in SP6 > > Correct. I verified it is working in SP6. Most of the changes were in v6.3 > and SP6 perf is based on v6.7 > > > communicate to our team if they really want this on SP5 then we will have to > > the ECO process. > > Yes, you will need to provide suitable rationale as to why you need this > feature and then we can scope the work and decide. > > Thanks! Please feel free to move the bugzilla to the right state. Im ok if we say resolved in SLES 15 SP6. Thanks Carol
(In reply to Carol Soto from comment #18) > (In reply to Tony Jones from comment #17) > > (In reply to Carol Soto from comment #16) > > > > > The changes are in SP6 > > > > Correct. I verified it is working in SP6. Most of the changes were in v6.3 > > and SP6 perf is based on v6.7 > > > > > communicate to our team if they really want this on SP5 then we will have to > > > the ECO process. > > > > Yes, you will need to provide suitable rationale as to why you need this > > feature and then we can scope the work and decide. > > > > Thanks! > > Please feel free to move the bugzilla to the right state. Im ok if we say > resolved in SLES 15 SP6. > Thanks > Carol closing, working in SP6. Nvidia will engage in ECO process if they require this hardware enablement in SP5.