Protocol ossification

Posted Jan 24, 2025 20:55 UTC (Fri) by NYKevin (subscriber, #129325)
In reply to: Protocol ossification by tialaramex
Parent article: The trouble with the new uretprobes

> Instead: Seccomp should have separate allow & block lists. If you neither allow nor block a system call, what happens depends on the kernel, and so NOW there's an opportunity for a policy debate about what goes in which pile and why but it's not a magic special case it's just a kernel policy decision which is nothing new.

Seccomp does not have allowlists or blocklists (except for SECCOMP_SET_MODE_STRICT, which sets an extremely restrictive hard-coded allowlist for compute-only applications). If you want to customize its behavior, you have to use BPF filters. But a BPF filter is just arbitrary-ish code. You can write whatever logic you want - if you want to have an allowlist, a blocklist, and some userspace-configurable behavior for unrecognized syscalls, you can do that today.

In practice, everybody just writes a simple allowlist implementation for their BPF filter, but the kernel did not make them do that.

Protocol ossification

Posted Jan 25, 2025 14:46 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (1 responses)

It's all very well to say the kernel didn't "make them" choose this, but what else was available? Imagine that next week we're adding a new kernel system call, you know nothing about it except it doesn't exist yet, now, what does your BPF filter say to ensure that this this call, in addition to the existing set, is callable, but no others?

Actually wait, changed my mind, after you wrote that BPF, I'm actually adding three new calls, two obviously you need to allow and one you definitely must not, does that just work with the BFP you wrote for the original statement?

Protocol ossification

Posted Jan 25, 2025 14:58 UTC (Sat) by intelfx (subscriber, #130118) [Link]

>>> Instead: Seccomp should have separate allow & block lists. If you neither allow nor block a system call, what happens depends on the kernel, and so NOW there's an opportunity for a policy debate about what goes in which pile and why but it's not a magic special case it's just a kernel policy decision which is nothing new.
>>
>> In practice, everybody just writes a simple allowlist implementation for their BPF filter, but the kernel did not make them do that.
>
> It's all very well to say the kernel didn't "make them" choose this, but what else was available? Imagine that next week we're adding a new kernel system call, you know nothing about it except it doesn't exist yet, now, what does your BPF filter say to ensure that this this call, in addition to the existing set, is callable, but no others?

The problem isn't even that seccomp does not have a rigid whitelist/blacklist mechanic (as NYKevin says). You can, for instance, let the BPF program return a third verdict "defer to the kernel judgment" in addition to the "allow" and "deny" verdicts.

However, in order to be able to have a meaningful "default policy" in the kernel, there has to be some agreed-upon overarching semantics for the entire seccomp mode 2 mechanism, and there isn't. Seccomp is just "a mechanism to mess with syscalls". Someone might use it for security, someone else might use it for debugging, or even a rudimentary form of fault injection. There is no way to have a meaningful default policy in the kernel when there is no predefined goal that this policy must fulfill.