Arch-PEBS and PMU supports for Clearwater Forest
From: | Dapeng Mi <dapeng1.mi-AT-linux.intel.com> | |
To: | Peter Zijlstra <peterz-AT-infradead.org>, Ingo Molnar <mingo-AT-redhat.com>, Arnaldo Carvalho de Melo <acme-AT-kernel.org>, Namhyung Kim <namhyung-AT-kernel.org>, Ian Rogers <irogers-AT-google.com>, Adrian Hunter <adrian.hunter-AT-intel.com>, Alexander Shishkin <alexander.shishkin-AT-linux.intel.com>, Kan Liang <kan.liang-AT-linux.intel.com>, Andi Kleen <ak-AT-linux.intel.com>, Eranian Stephane <eranian-AT-google.com> | |
Subject: | [PATCH 00/20] Arch-PEBS and PMU supports for Clearwater Forest | |
Date: | Thu, 23 Jan 2025 14:07:01 +0000 | |
Message-ID: | <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> | |
Cc: | linux-kernel-AT-vger.kernel.org, linux-perf-users-AT-vger.kernel.org, Dapeng Mi <dapeng1.mi-AT-intel.com>, Dapeng Mi <dapeng1.mi-AT-linux.intel.com> | |
Archive-link: | Article |
This patch series enables PMU and architectural PEBS (arch-PEBS) for Clearwater Forest (CWF). Comparing with previous generation Sierra Forest (SRF), CWF has two key differences on PMU perspective. a. Increases 3 fixed counters, fixed counter 4, 5 and 6 which are used to profile topdown-bad-spec, topdown-fe-bound and topdown-retiring events. b. Introduce architectural PEBS which is used to replace previous DS area based PEBS. The general fixed counter bitmap (CPUID.23H.1H.EBX) support has been upstreamed along with the ARL/LNL PMU enabling patches. Only CWF specific event attributes need to be supported in this patch series. Comparing with the legacy DS area based PEBS, especially for adaptive PEBS, arch-PEBS basically inherits all currently supported PEBS groups, such as basic-info group, memory-info group, GPRs group, XMMs (Vector registers) group and LBRs group, but with some new fields in these groups. The key differences between legacy PEBS and arch-PEBS are a. Arch-PEBS leverages CPUID.23H.4/5H sub-leaves to enumerate supported capabilities. These two cpuid sub-leaves tell which PEBS groups are supported and which GP and fixed counters support arch-PEBS. IA32_PERF_CAPABILITIES MSR is not used for indicating PEBS capabilities anymore. b. Arch-PEBS increases several new MSRs, IA32_PEBS_BASE, IA32_PEBS_INDEX and IA32_PMC_GPn/FXm_CFG_C MSRs. IA32_PEBS_BASE MSR is used to tell HW the PEBS buffer starting address and size, HW would write the captured records into the PEBS buffer. IA32_PEBS_INDEX MSR provides several fields, like WR_OFFSET and THRESH_OFFSET. WR_OFFSET is used to tell SW the latest PEBS record offset in PEBS buffer by HW and THRESH_OFFSET tells HW that a PMI should be generated if current written PEBS record cross this offset. IA32_PMC_GPn/FXm_CFG_C is per-counter MSR, each GP or fixed counter has its own CFG_C MSR. This MSR is used to configure which PEBS groups would be captured for the corresponding counter once the counter overflows. Since each counter has its owned CFG_C MSR, it means that arch-PEBS supports to capture different PEBS groups for different counters. Whereas the legacy DS-based PEBS only supports a global shared PEBS configuration, all counters have to share same PEBS configuration. This is a significant improvement for arch-PEBS, it provides more flexibility and higher efficiency. The legacy MSRs IA32_PEBS_ENABLE and MSR_PEBS_DATA_CFG are deprecated in arch-PEBS. c. Arch-PEBS increases the capabilities to capture more registers, like SSP register and higher width xsave-enabled vector registers, like OPMASK/YMM/ZMM registers. Of course, not all platforms support to capture all these vector registers, HW would suggest which vector registers are supported by CPUID.23H.4H.EBX, such as CWF only supports to capture XMM/YMM registers. New added SSP register would be placed into GPRs group and all vector registers including XMM registers would be placed into xsave-enabled registers group. d. Arch-PEBS does some changes on the PEBS record layout although some groups still have same format with previous legacy adaptive PEBS. Arch-PEBS supports PEBS record fragments as well, if the continued bit in the record header is set, it indicates there are fragments followed. The details about arch-PEBS can be found in chapter 11 "Architectural PEBS" of "Intel architecture instruction set extensions programming reference"[1]. The patch 01/20 provides basic PMU support for CWF, the patch 02/20 fixes an error about parsing archPerfmonExt (0x23) CPUID leaf, the left patches (03/20 ~ 20/20) provides arch-PEBS support for kernel and perf tools. This patch series is based on Peter's perf/core tree + Kan's PEBS counter snapshot patchset (v10)[2]. Tests: The following tests are run on CWF and no issue is found. Please notice nmi_watchdog is disabled when running the tests. a. Basic perf counting case. perf stat -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles,topdown-bad-spec,topdown-fe-bound,topdown-retiring}' sleep 1 b. Basic PMI based perf sampling case. perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles,topdown-bad-spec,topdown-fe-bound,topdown-retiring}' sleep 1 c. Basic PEBS based perf sampling case. perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles,topdown-bad-spec,topdown-fe-bound,topdown-retiring}:p' sleep 1 d. PEBS sampling case with basic, GPRs, vector-registers and LBR groups perf record -e branches:p -Iax,bx,ip,ssp,xmm0,ymmh0 -b -c 10000 sleep 1 e. PEBS sampling case with auxiliary (memory info) group perf mem record sleep 1 f. PEBS sampling case with counter group perf record -e '{branches:p,branches,cycles}:S' -c 10000 sleep 1 g. Perf stat and record test perf test 95; perf test 119 h. perf-fuzzer test Ref: [1] https://www.intel.com/content/www/us/en/content-details/8... [2] https://lore.kernel.org/all/20250121152303.3128733-1-kan.... Dapeng Mi (19): perf/x86/intel: Add PMU support for Clearwater Forest perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUs perf/x86/intel: Decouple BTS initialization from PEBS initialization perf/x86/intel: Rename x86_pmu.pebs to x86_pmu.ds_pebs perf/x86/intel: Initialize architectural PEBS perf/x86/intel/ds: Factor out common PEBS processing code to functions perf/x86/intel: Process arch-PEBS records or record fragments perf/x86/intel: Factor out common functions to process PEBS groups perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR perf/x86/intel: Setup PEBS constraints base on counter & pdist map perf/x86/intel: Setup PEBS data configuration and enable legacy groups perf/x86/intel: Add SSP register support for arch-PEBS perf/x86/intel: Add counter group support for arch-PEBS perf/core: Support to capture higher width vector registers perf/x86/intel: Support arch-PEBS vector registers group capturing perf tools: Support to show SSP register perf tools: Support to capture more vector registers (common part) perf tools: Support to capture more vector registers (x86/Intel part) perf tools/tests: Add vector registers PEBS sampling test Kan Liang (1): perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF arch/arm/kernel/perf_regs.c | 6 + arch/arm64/kernel/perf_regs.c | 6 + arch/csky/kernel/perf_regs.c | 5 + arch/loongarch/kernel/perf_regs.c | 5 + arch/mips/kernel/perf_regs.c | 5 + arch/powerpc/perf/perf_regs.c | 5 + arch/riscv/kernel/perf_regs.c | 5 + arch/s390/kernel/perf_regs.c | 5 + arch/x86/events/core.c | 94 ++- arch/x86/events/intel/bts.c | 6 +- arch/x86/events/intel/core.c | 292 +++++++- arch/x86/events/intel/ds.c | 697 ++++++++++++++---- arch/x86/events/perf_event.h | 62 +- arch/x86/include/asm/intel_ds.h | 10 +- arch/x86/include/asm/msr-index.h | 28 + arch/x86/include/asm/perf_event.h | 147 +++- arch/x86/include/uapi/asm/perf_regs.h | 86 ++- arch/x86/kernel/perf_regs.c | 55 +- include/linux/perf_event.h | 2 + include/linux/perf_regs.h | 10 + include/uapi/linux/perf_event.h | 10 + kernel/events/core.c | 53 +- tools/arch/x86/include/uapi/asm/perf_regs.h | 87 ++- tools/include/uapi/linux/perf_event.h | 13 + tools/perf/arch/arm/util/perf_regs.c | 5 +- tools/perf/arch/arm64/util/perf_regs.c | 5 +- tools/perf/arch/csky/util/perf_regs.c | 5 +- tools/perf/arch/loongarch/util/perf_regs.c | 5 +- tools/perf/arch/mips/util/perf_regs.c | 5 +- tools/perf/arch/powerpc/util/perf_regs.c | 9 +- tools/perf/arch/riscv/util/perf_regs.c | 5 +- tools/perf/arch/s390/util/perf_regs.c | 5 +- tools/perf/arch/x86/util/perf_regs.c | 108 ++- tools/perf/builtin-script.c | 19 +- tools/perf/tests/shell/record.sh | 55 ++ tools/perf/util/evsel.c | 14 +- tools/perf/util/intel-pt.c | 2 +- tools/perf/util/parse-regs-options.c | 23 +- .../perf/util/perf-regs-arch/perf_regs_x86.c | 90 +++ tools/perf/util/perf_regs.c | 5 - tools/perf/util/perf_regs.h | 18 +- tools/perf/util/record.h | 2 +- tools/perf/util/sample.h | 6 +- tools/perf/util/session.c | 31 +- tools/perf/util/synthetic-events.c | 7 +- 45 files changed, 1851 insertions(+), 267 deletions(-) -- 2.40.1