mm/folio_zero_user: add multi-page clearing
From: | Ingo Molnar <mingo-AT-kernel.org> | |
To: | Ankur Arora <ankur.a.arora-AT-oracle.com> | |
Subject: | Re: [PATCH v3 0/4] mm/folio_zero_user: add multi-page clearing | |
Date: | Mon, 14 Apr 2025 08:36:18 +0200 | |
Message-ID: | <Z_ys4jJ8MQ4-kW8P@gmail.com> | |
Cc: | linux-kernel-AT-vger.kernel.org, linux-mm-AT-kvack.org, x86-AT-kernel.org, torvalds-AT-linux-foundation.org, akpm-AT-linux-foundation.org, bp-AT-alien8.de, dave.hansen-AT-linux.intel.com, hpa-AT-zytor.com, mingo-AT-redhat.com, luto-AT-kernel.org, peterz-AT-infradead.org, paulmck-AT-kernel.org, rostedt-AT-goodmis.org, tglx-AT-linutronix.de, willy-AT-infradead.org, jon.grimm-AT-amd.com, bharata-AT-amd.com, raghavendra.kt-AT-amd.com, boris.ostrovsky-AT-oracle.com, konrad.wilk-AT-oracle.com | |
Archive-link: | Article |
* Ankur Arora <ankur.a.arora@oracle.com> wrote: > We also see performance improvement for cases where this optimization is > unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because > REP; STOS is typically microcoded which can now be amortized over > larger regions and the hint allows the hardware prefetcher to do a > better job. > > Milan (EPYC 7J13, boost=0, preempt=full|lazy): > > mm/folio_zero_user x86/folio_zero_user change > (GB/s +- stddev) (GB/s +- stddev) > > pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2% > pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5% > > Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy): > > mm/folio_zero_user x86/folio_zero_user change > (GB/s +- stddev) (GB/s +- stddev) > > pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57% > pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10% How was this measured? Could you integrate this measurement as a new tools/perf/bench/ subcommand so that people can try it on different systems, etc.? There's already a 'perf bench mem' subcommand space where this feature could be added to. Thanks, Ingo