Donor Migration for Proxy Execution (v22)
From: | John Stultz <jstultz-AT-google.com> | |
To: | LKML <linux-kernel-AT-vger.kernel.org> | |
Subject: | [PATCH v22 0/6] Donor Migration for Proxy Execution (v22) | |
Date: | Fri, 26 Sep 2025 03:29:08 +0000 | |
Message-ID: | <20250926032931.27663-1-jstultz@google.com> | |
Cc: | John Stultz <jstultz-AT-google.com>, Joel Fernandes <joelagnelf-AT-nvidia.com>, Qais Yousef <qyousef-AT-layalina.io>, Ingo Molnar <mingo-AT-redhat.com>, Peter Zijlstra <peterz-AT-infradead.org>, Juri Lelli <juri.lelli-AT-redhat.com>, Vincent Guittot <vincent.guittot-AT-linaro.org>, Dietmar Eggemann <dietmar.eggemann-AT-arm.com>, Valentin Schneider <vschneid-AT-redhat.com>, Steven Rostedt <rostedt-AT-goodmis.org>, Ben Segall <bsegall-AT-google.com>, Zimuzo Ezeozue <zezeozue-AT-google.com>, Mel Gorman <mgorman-AT-suse.de>, Will Deacon <will-AT-kernel.org>, Waiman Long <longman-AT-redhat.com>, Boqun Feng <boqun.feng-AT-gmail.com>, "Paul E. McKenney" <paulmck-AT-kernel.org>, Metin Kaya <Metin.Kaya-AT-arm.com>, Xuewen Yan <xuewen.yan94-AT-gmail.com>, K Prateek Nayak <kprateek.nayak-AT-amd.com>, Thomas Gleixner <tglx-AT-linutronix.de>, Daniel Lezcano <daniel.lezcano-AT-linaro.org>, Suleiman Souhlal <suleiman-AT-google.com>, kuyo chang <kuyo.chang-AT-mediatek.com>, hupu <hupu.gm-AT-gmail.com>, kernel-team-AT-android.com | |
Archive-link: | Article |
Hey All, I wanted to continue pushing for feedback on the next chunk of the series: Donor Migration This is just the next step for Proxy Execution, to allow us to migrate blocked donors across runqueues to boost remote lock owners. As always, I’m trying to submit this larger work in smallish digestible pieces, so in this portion of the series, I’m only submitting for review and consideration the logic that allows us to do donor(blocked waiter) migration, which requires some additional changes to locking and extra state tracking to ensure we don’t accidentally run a migrated donor on a cpu it isn’t affined to, as well as some extra handling to deal with balance callback state that needs to be reset when we decide to pick a different task after doing donor migration. My last version got a lot of great feedback from K Prateek Nayak, which while not significantly changing behavior, did have me reworking and reorganizing quite a bit of code in this series: * Reworking find_proxy_task() to avoid mixing gotos with guard() usage. Instead break and switch() on a set action enum. * Zap callbacks when we resched idle * Remove unjustified curr != donor check in pick_next_task_fair() * Simplifications around put_prev_set_next() in the migration logic * Reorder functions for readability * Move a few task_struct elements under #ifdef CONFIG_SCHED_PROXY_EXEC * Switch to one-line stubs and other white space and spelling cleanups. I’d love to get further feedback on any place where these patches are confusing, or could use additional clarifications. Also Suleiman Souhlal and I have been working on some enhancements to the full Proxy Execution series: * Suleiman has implemented a first pass at enabling Proxy Exec on rw_sems! Rw_sems have been another common source of PI inversion problems, so I’m excited to be able to have the Proxy Exec approach be able to help solve those issues as well. More work and validation are required, but it’s very exciting! * I’ve been working to allow Proxy Exec to work with sched_ext. Currently I’ve worked out the crashers I was initially seeing. However, I find my stress tests tend to eventually cause problems, though this seems unfortunately the case without proxy-exec as well, and seems to be due to the missing dl_server for sched_ext. I need to try to test with Andrea Righi’s series here: https://lore.kernel.org/lkml/20250903095008.162049-1-arig... I still have further work to better understand if Proxy switching the selected task breaks bpf scheduler assumptions and what might be done about it. Also you can find the full proxy-exec series here: https://github.com/johnstultz-work/linux-dev/commits/prox... https://github.com/johnstultz-work/linux-dev.git proxy-exec-v22-6.17-rc6 Issues still to address with the full series: * Continue working to get sched_ext to be ok with proxy-execution enabled. * K Prateek Nayak re-did some performance testing with both this set and the full series, and while the set I’m submitting here looked ok, the full series did see regressions. I’m working to reproduce this so I can narrow the issue down. * The chain migration functionality needs further iterations and better validation to ensure it truly maintains the RT/DL load balancing invariants (despite this being broken in vanilla upstream with RT_PUSH_IPI currently) Future work: * Expand to more locking primitives: Figuring out pi-futexes would be good too. * Eventually: Work to replace rt_mutexes and get things happy with PREEMPT_RT I’d really appreciate any feedback or review thoughts on the full series as well. I’m trying to keep the chunks small, reviewable and iteratively testable, but if you have any suggestions on how to improve the larger series, I’m all ears. Credit/Disclaimer: —-------------------- As always, this Proxy Execution series has a long history with lots of developers that deserve credit: First described in a paper[2] by Watkins, Straub, Niehaus, then from patches from Peter Zijlstra, extended with lots of work by Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank you to Steven Rostedt for providing additional details here!) So again, many thanks to those above, as all the credit for this series really is due to them - while the mistakes are likely mine. Thanks so much! -john [1] https://lore.kernel.org/lkml/20250805001026.2247040-1-jst... [2] https://static.lwn.net/images/conf/rtlws11/papers/proc/p3... Cc: Joel Fernandes <joelagnelf@nvidia.com> Cc: Qais Yousef <qyousef@layalina.io> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Zimuzo Ezeozue <zezeozue@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Metin Kaya <Metin.Kaya@arm.com> Cc: Xuewen Yan <xuewen.yan94@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: kuyo chang <kuyo.chang@mediatek.com> Cc: hupu <hupu.gm@gmail.com> Cc: kernel-team@android.com John Stultz (5): locking: Add task::blocked_lock to serialize blocked_on state sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration sched: Add logic to zap balance callbacks if we pick again sched: Handle blocked-waiter migration (and return migration) sched: Migrate whole chain in proxy_migrate_task() Peter Zijlstra (1): sched: Add blocked_donor link to task for smarter mutex handoffs include/linux/sched.h | 130 ++++++++++---- init/init_task.c | 6 + kernel/fork.c | 7 +- kernel/locking/mutex-debug.c | 4 +- kernel/locking/mutex.c | 86 +++++++-- kernel/locking/ww_mutex.h | 20 +-- kernel/sched/core.c | 339 ++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 6 +- 8 files changed, 507 insertions(+), 91 deletions(-) -- 2.51.0.536.g15c5d4f767-goog