| From: |
| Yafang Shao <laoar.shao-AT-gmail.com> |
| To: |
| roman.gushchin-AT-linux.dev, inwardvessel-AT-gmail.com, shakeel.butt-AT-linux.dev, akpm-AT-linux-foundation.org, ast-AT-kernel.org, daniel-AT-iogearbox.net, andrii-AT-kernel.org, mkoutny-AT-suse.com, yu.c.chen-AT-intel.com, zhao1.liu-AT-intel.com |
| Subject: |
| [RFC PATCH bpf-next 0/3] BPF-based NUMA balancing |
| Date: |
| Tue, 13 Jan 2026 20:12:35 +0800 |
| Message-ID: |
| <20260113121238.11300-1-laoar.shao@gmail.com> |
| Cc: |
| bpf-AT-vger.kernel.org, linux-mm-AT-kvack.org, Yafang Shao <laoar.shao-AT-gmail.com> |
| Archive-link: |
| Article |
In our large fleet of Kubernetes-managed servers, NUMA balancing has been
historically disabled globally on each server. With increasing deployment
of AMD EPYC servers in our fleet, cross-NUMA access has become a critical
performance issue, prompting us to consider enabling NUMA balancing to
address it.
However, enabling NUMA balancing globally is not acceptable as it would
increase overall system overhead and potentially introduce latency spikes
for latency-sensitive workloads. Instead, we aim to enable it selectively
for workloads that can genuinely benefit from it. Even for such workloads,
we require fine-grained per-workload tuning capabilities.
To maximize cross-NUMA page migration while minimizing overhead, we
propose tuning NUMA balancing per workload using BPF.
This patchset introduces a new BPF hook ->numab_hook() as a memory cgroup
based struct-ops. This enables NUMA balancing for specific workloads
while keeping global NUMA balancing disabled. It also allows tuning
NUMA balancing parameters per workload. Patch #3 demonstrates how to
adjust the hot threshold per workload using BPF.
Since bpf_struct_ops and cgroups integration [0] is still under
development by Roman, this patchset temporarily embeds the cgroup ID
into the struct-ops for review purposes. We can migrate to the new
approach once it's available.
This is still an RFC with limited testing. Any feedback is welcome.
[0]. https://lore.kernel.org/bpf/CAADnVQJGiH_yF=AoFSRy4zh20une...
Yafang Shao (3):
sched: add helpers for numa balancing
mm: add support for bpf based numa balancing
mm: set numa balancing hot threshold with bpf
MAINTAINERS | 1 +
include/linux/memcontrol.h | 6 +
include/linux/sched/numa_balancing.h | 44 +++++
kernel/sched/fair.c | 17 +-
kernel/sched/sched.h | 2 -
mm/Makefile | 5 +
mm/bpf_numa_balancing.c | 252 +++++++++++++++++++++++++++
mm/memory-tiers.c | 3 +-
mm/mempolicy.c | 3 +-
mm/migrate.c | 7 +-
mm/vmscan.c | 7 +-
11 files changed, 326 insertions(+), 21 deletions(-)
create mode 100644 mm/bpf_numa_balancing.c
--
2.43.5