| From: |
| Christian Brauner <brauner-AT-kernel.org> |
| To: |
| linux-fsdevel-AT-vger.kernel.org |
| Subject: |
| [PATCH 00/17] eventpoll: clarity refactor |
| Date: |
| Fri, 24 Apr 2026 15:46:31 +0200 |
| Message-ID: |
| <20260424-work-epoll-rework-v1-0-249ed00a20f3@kernel.org> |
| Cc: |
| Alexander Viro <viro-AT-zeniv.linux.org.uk>, Jan Kara <jack-AT-suse.cz>, Linus Torvalds <torvalds-AT-linux-foundation.org>, Jens Axboe <axboe-AT-kernel.dk>, "Christian Brauner (Amutable)" <brauner-AT-kernel.org> |
| Archive-link: |
| Article |
The recent UAF series (a6dc643c6931 and follow-ups) rode on
invariants in fs/eventpoll.c that were nowhere documented and had
to be reverse-engineered from the code: the lifetime relationships
between struct eventpoll, struct epitem, and struct file, the three
removal paths coordinating via epi_fget() pins and ep->mtx, the
ovflist sentinel-encoded scan state machine, the POLLFREE
release/acquire handshake, and the loop / path check globals
serialized by epnested_mutex. The fix was correct but the next
person to touch this code will hit the same learning curve.
This adds a bunch of documentation (a bunch of swearwords were removed
by having an llm go over it) and refactors. The end goal is hopefully a
bit more pallatable than what this is right now. No functional changes
intended yet.
This series codifies those invariants in source and tightens the
surrounding structure.
First there are a couple of pure documentation changes. A top-of-file
overview with field-protection tables for struct eventpoll and struct
epitem, a section gathering the loop-check / path-check globals next to
their declarations, labelled comments on the two sides of the POLLFREE
handshake, refreshed comments on epi_fget() and ep_remove_file() (whose
contract the UAF fix re-shaped), and a docblock on
ep_clear_and_put() that names its two-pass structure as load-bearing.
Next are a couple of mechanical naming cleanups.
ep_refcount_dec_and_test() -> ep_put() to pair with ep_get(); the unused
depth argument dropped from epoll_mutex_lock() (all three callers passed
zero); attach_epitem() -> ep_attach_file() for ep_remove_file()
symmetry; and the CONFIG_KCMP block relocated next to CONFIG_COMPAT so
the hot-path code is contiguous.
Next are a couple of changes that extract long bodies into named
helpers. ep_insert() splits into ep_alloc_epitem() and
ep_register_epitem(); ep_clear_and_put()'s two passes become
ep_drain_pollwaits() and ep_drain_tree() so the ordering invariant is
enforced by the call sequence rather than convention; the per-event
delivery loop body extracts from ep_send_events() as ep_deliver_event();
and the ep->mtx + epnested_mutex acquisition dance lifts out of
do_epoll_ctl() into ep_ctl_lock() / ep_ctl_unlock(), with a return value
that doubles as the @full_check argument to ep_insert().
Next are a couple of changes that address sentinel and predicate sprawl.
The EP_UNACTIVE_PTR overload (meaning "no scan in progress" on
ep->ovflist and "epi not on ovflist" on epi->next) is hidden behind
named helpers (ep_is_scanning, epi_on_ovflist, ...); epi->next is
renamed to epi->ovflist_next and the local txlist to scan_batch; and
is_file_epoll(), ep_is_linked(), ep_events_available() are converted to
return bool to match their already-boolean bodies.
And last we move the per-CTL_ADD scratch state (tfile_check_list,
path_count[], inserting_into) from file-scope globals into a
stack-allocated struct ep_ctl_ctx plumbed through the loop / path check
chain. loop_check_gen stays at file scope because the stamp it leaves on
ep->gen across calls must not collide with a future walk.
The load-bearing invariants the UAF series closed are preserved
verbatim: the epi_fget() pin in ep_remove(), the ordering of
ep_unregister_pollwait() before ep_remove_file() / ep_remove_epi()
in all three removal paths, kfree_rcu(epi) and kfree_rcu(ep), the
POLLFREE smp_store_release / smp_load_acquire pair on pwq->whead,
ep->lock IRQ-safety, the mutex_lock_nested() subclass arithmetic
in ep_insert (subclass 0 outer, 1 for tep) and __ep_eventpoll_poll
/ ep_loop_check_proc (depth-based), and the WARN_ON_ONCE contract
on ep_put() in ep_remove().
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Christian Brauner (17):
eventpoll: expand top-of-file overview / locking doc
eventpoll: document loop-check / path-check globals
eventpoll: clarify POLLFREE handshake comments
eventpoll: refresh epi_fget() / ep_remove_file() comments
eventpoll: document ep_clear_and_put() two-pass pattern
eventpoll: rename ep_refcount_dec_and_test() to ep_put()
eventpoll: drop unused depth argument from epoll_mutex_lock()
eventpoll: rename attach_epitem() to ep_attach_file()
eventpoll: relocate KCMP helpers near compat syscalls
eventpoll: split ep_insert() into alloc + register stages
eventpoll: split ep_clear_and_put() into drain helpers
eventpoll: extract ep_deliver_event() from ep_send_events()
eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock()
eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers
eventpoll: rename epi->next and txlist for clarity
eventpoll: use bool for predicate helpers
eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx
fs/eventpoll.c | 1183 +++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 778 insertions(+), 405 deletions(-)
---
base-commit: dd6c438c3e64a5ff0b5d7e78f7f9be547803ef1b
change-id: 20260424-work-epoll-rework-a02330741d24