Seccomp filters for multi-threaded programs
In current kernels, a process can apply a filter program to itself with the prctl() system call:
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, filter);
Where filter is a pointer to a sock_fprog structure containing the BPF program to be applied. Multiple programs can be added with multiple prctl() calls; each will be executed in sequence and any can reject a system call. There is no mechanism for removing filters once they have been applied to a process. Adding filters is normally a privileged operation; otherwise there is a real risk of privilege escalation via setuid programs that do not expect some operations to be denied (see this old sendmail vulnerability for an example). But any process may set filters on itself if it has first called:
prctl(PR_SET_NO_NEW_PRIVS, 1);
to disable the addition of any privileges to the process. In particular, a process marked as "no new privileges" cannot gain capabilities or access to a different user ID by running setuid or setgid programs.
What is lacking in the current interface is any way for a process to apply filters to a different process (or thread). There does not seem to be a use case for the ability to add filters to arbitrary processes; among other things, trying to contain a program that is already off and running would be a recipe for unpleasant race conditions. But it seems that there is value in allowing a thread to apply filters to its sibling threads. In the absence of this ability, it can be hard to ensure that a seccomp filter applies to all threads running as part of a process. Threads inherit their parent's filters when they are created, but any threads created before the filters are applied will remain uncontained. It may not always be practical to set up the filters before any threads are created, so the ability to attach them to threads after creation is a useful way to ensure that no part of a program escapes filtering.
Adding that ability is the object of this patch set from Kees Cook. All Kees really needed to do was to add an "apply this filter to all threads" flag to the PR_SET_SECCOMP operation, but, as so often seems to be the case, that operation was defined without the ability to pass in additional flags to modify its behavior. So, instead, Kees has added a new operation:
prctl(PR_SECCOMP_EXT, SECCOMP_EXT_ACT, SECCOMP_EXT_ACT_FILTER, flags, filter);
If flags is zero, this operation behaves just like the PR_SET_SECCOMP example above; it attaches filter to the calling process. But if the SECCOMP_FILTER_TSYNC flag is set, the given filter (along with any other filters already applied to the calling process) will be applied to all threads in the process's thread group, thus ensuring that all threads are running with the same set of filters.
There is one other new operation:
prctl(PR_SECCOMP_EXT, SECCOMP_EXT_ACT, SECCOMP_EXT_ACT_TSYNC, 0, 0);
This one will apply the calling process's filters to all other threads without making any changes to the filters themselves.
In either case, other threads will only have their filtering changed if whatever filter they currently have applied is an "ancestor" of the filters running on the calling process. Essentially, any filters applied to the target thread must also have been applied to the calling thread; any thread that has a totally unrelated filter will not have its filtering changed. If a thread is not running with a filter at all, it will be put into the seccomp mode and the filters will be applied. Also, if the calling thread has the "no new privileges" mode set, that mode will be set on all other threads as well.
This is the fifth version of this patch set; the previous attempts needed
work in response to locking and other issues. Unless another problem turns
up, this code should be about ready for merging. There does not appear to
be any opposition to the concept, so this feature could find its way into
the mainline as early as 3.16.
| Index entries for this article | |
|---|---|
| Kernel | Security/seccomp |
