| From: |
| Sargun Dhillon <sargun-AT-sargun.me> |
| To: |
| Kees Cook <keescook-AT-chromium.org>, LKML <linux-kernel-AT-vger.kernel.org>, Linux Containers <containers-AT-lists.linux.dev> |
| Subject: |
| [PATCH v4 0/3] Handle seccomp notification preemption |
| Date: |
| Tue, 03 May 2022 01:09:55 -0700 |
| Message-ID: |
| <20220503080958.20220-1-sargun@sargun.me> |
| Cc: |
| Sargun Dhillon <sargun-AT-sargun.me>, Rodrigo Campos <rodrigo-AT-kinvolk.io>, Christian Brauner <christian.brauner-AT-ubuntu.com>, Giuseppe Scrivano <gscrivan-AT-redhat.com>, Will Drewry <wad-AT-chromium.org>, Andy Lutomirski <luto-AT-amacapital.net>, Tycho Andersen <tycho-AT-tycho.pizza>, Alban Crequy <alban-AT-kinvolk.io> |
| Archive-link: |
| Article |
This patchset addresses a race condition we've dealt with recently with
seccomp. Specifically programs interrupting syscalls while they're in
progress. This was exacerbated by Golang's[1] recent adoption of
"Non-cooperative goroutine preemption", in which they try to interrupt any
syscall that's been running for more than 10ms. During certain syscalls,
it's non-trivial to write them in a reetrant manner in userspace (mount).
It allows a per-filter flag to be set that makes it so that the notifying
process will switch to "TASK_KILLABLE" as opposed to returning to userspace
on non-fatal signals.
Changes since v3[4]:
* Clean up tests
* Split out helper function (dedupe code)
* Add some explanation about whats going on
* Small documentation edit
Changes since v2[3]:
* Split out addfd patches
* Move the flag to be per-filter (as opposed to per notification)
Changes since v1[2]:
* Fix some documentation
* Add Rata's patches to allow for direct return from addfd
[1]: https://github.com/golang/proposal/blob/master/design/245...
[2]: https://lore.kernel.org/lkml/20210220090502.7202-1-sargun...
[3]: https://lore.kernel.org/all/20210426180610.2363-1-sargun@...
[4]: https://lore.kernel.org/lkml/20220429023113.74993-1-sargu...
Sargun Dhillon (3):
seccomp: Add wait_killable semantic to seccomp user notifier
selftests/seccomp: Refactor get_proc_stat to split out file reading
code
selftests/seccomp: Add test for wait killable notifier
.../userspace-api/seccomp_filter.rst | 10 +
include/linux/seccomp.h | 3 +-
include/uapi/linux/seccomp.h | 2 +
kernel/seccomp.c | 42 ++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 282 +++++++++++++++++-
5 files changed, 320 insertions(+), 19 deletions(-)
--
2.25.1