Hierarchical Constant Bandwidth Server
From: | Yuri Andriaccio <yurand2000-AT-gmail.com> | |
To: | Ingo Molnar <mingo-AT-redhat.com>, Peter Zijlstra <peterz-AT-infradead.org>, Juri Lelli <juri.lelli-AT-redhat.com>, Vincent Guittot <vincent.guittot-AT-linaro.org>, Dietmar Eggemann <dietmar.eggemann-AT-arm.com>, Steven Rostedt <rostedt-AT-goodmis.org>, Ben Segall <bsegall-AT-google.com>, Mel Gorman <mgorman-AT-suse.de>, Valentin Schneider <vschneid-AT-redhat.com> | |
Subject: | [RFC PATCH v2 00/25] Hierarchical Constant Bandwidth Server | |
Date: | Thu, 31 Jul 2025 12:55:18 +0200 | |
Message-ID: | <20250731105543.40832-1-yurand2000@gmail.com> | |
Cc: | linux-kernel-AT-vger.kernel.org, Luca Abeni <luca.abeni-AT-santannapisa.it>, Yuri Andriaccio <yuri.andriaccio-AT-santannapisa.it> | |
Archive-link: | Article |
Hello, This is the v2 for Hierarchical Constant Bandwidth Server, aiming at replacing the current RT_GROUP_SCHED mechanism with something more robust and theoretically sound. The patchset has been presented at OSPM25 (https://retis.sssup.it/ospm-summit/), and a summary of its inner workings can be found at https://lwn.net/Articles/1021332/ . You can find the v1 of this patchset at the bottom of the page, which talks in more detail what this patchset is all about and how it is implemented. The big update for this v2 version is the addition of migration code, which allows to migrate tasks between different CPUs (following of course affinity settings). As requested, we've split the big patches in smaller chunks in order to improve in readability. Additionally, it has been rebased on the latest tip/master to keep up with the latest scheduler updates and new features of dl_servers. Last but not least, the first patch, which has been presented separately at https://lore.kernel.org/all/20250725164412.35912-1-yurand... , is necessary to fully utilize the deadline bandwidth while keeping the fair-servers active. You can refer to the aforementioned link for details. The issue presented in this patch also reflects in HCBS: in the current version of the kernel, by default, 5% of the realtime bandwidth is reserved for fair-servers, 5% is not usable, and only the remaining 90% could be used by deadline tasks, or in our case, by HCBS dl_servers. The first patch addresses this issue and allows to fully utilize the default 95% of bandwidth for rt-tasks/servers. Summary of the patches: 1) Account fair-servers bw separately from other dl tasks and servers bw. 2-5) Preparation patches, so that the RT classes' code can be used both for normal and cgroup scheduling. 6-15) Implementation of HCBS, no migration and only one level hierarchy. The old RT_GROUP_SCHED code is removed. 16-18) Remove cgroups v1 in favour of v2. 19) Add support for deeper hierarchies. 20-25) Add support for tasks migration. Updates from v1: - Rebase to tip/master. - Add migration code. - Split big patches for more readability. - Refactor code to use guarded locks where applicable. - Remove unnecessary patches from v1 which have been addressed differently by mainline updates. - Remove unnecessary checks and general code cleanup. Notes: Task migration support needs some extra work to reduce its invasiveness, especially patches 22-23. Testing v2: The HCBS mechanism has been further evaluated on two fully-fledged distros, instead of virtual machines, demonstrating stability in this latest version. A small suite of regression tests shows that the newly added mechanism does not break fair-servers and other scheduling mechanisms. Stress tests show that our implementation is robust while time-based tests demonstrate that the theoretical analysis of real-time tasksets matches with the implementation. The tests can be found at https://github.com/Yurand2000/HCBS-rust-initrd . The executables are essentially the same as the ones mentioned in the v1 version, minor some updates. You can refer to that for additional details. Future Work: We want to further test this patchset, and provide a more commented description of the test suite so that it can be fully automated for testing also by other people. Additionally, we will finish the currently partial/untested, implementation of HCBS with different runtimes per CPU, instead of having the same runtime allocated on all CPUs, to include it in a future RCF. Future patches: - HCBS with different runtimes per CPU. - capacity aware bandwidth reservation. - enable/disable dl_servers when a CPU goes online/offline. Have a nice day, Yuri v1: https://lore.kernel.org/all/20250605071412.139240-1-yuran... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Yuri Andriaccio (6): sched/deadline: Remove fair-servers from real-time task's bandwidth accounting sched/rt: Disable RT_GROUP_SCHED sched/deadline: Account rt-cgroups bandwidth in sched_dl_global_validate sched/rt: Remove support for cgroups-v1 sched/rt: Zero rt-cgroups default bandwidth sched/core: Execute enqueued balance callbacks when migrating task betweeen cgroups luca abeni (19): sched/deadline: Do not access dl_se->rq directly sched/deadline: Distinct between dl_rq and my_q sched/rt: Pass an rt_rq instead of an rq where needed sched/rt: Move some functions from rt.c to sched.h sched/rt: Introduce HCBS specific structs in task_group sched/deadline: Account rt-cgroups bandwidth in deadline tasks schedulability tests. sched/core: Initialize root_task_group sched/deadline: Add dl_init_tg sched/rt: Add {alloc/free}_rt_sched_group and dl_server specific functions sched/rt: Add HCBS related checks and operations for rt tasks sched/rt: Update rt-cgroup schedulability checks sched/rt: Remove old RT_GROUP_SCHED data structures sched/core: Cgroup v2 support sched/deadline: Allow deeper hierarchies of RT cgroups sched/rt: Add rt-cgroup migration sched/rt: add HCBS migration related checks and function calls sched/deadline: Make rt-cgroup's servers pull tasks on timer replenishment sched/deadline: Fix HCBS migrations on server stop sched/core: Execute enqueued balance callbacks when changing allowed CPUs include/linux/sched.h | 10 +- kernel/sched/autogroup.c | 4 +- kernel/sched/core.c | 68 +- kernel/sched/deadline.c | 311 ++-- kernel/sched/debug.c | 6 - kernel/sched/fair.c | 6 +- kernel/sched/rt.c | 3024 ++++++++++++++++++-------------------- kernel/sched/sched.h | 140 +- kernel/sched/syscalls.c | 6 +- kernel/sched/topology.c | 8 - 10 files changed, 1829 insertions(+), 1754 deletions(-) -- 2.50.1