PostgreSQL considers seccomp() filters
PostgreSQL considers seccomp() filters
Posted Oct 2, 2019 14:58 UTC (Wed) by brauner (subscriber, #109349)In reply to: PostgreSQL considers seccomp() filters by cyphar
Parent article: PostgreSQL considers seccomp() filters
You mean btf...
And an overarching problem is that (for unknown-to-userspace syscalls), the best you can really do is block the syscall outright. But maybe some pledge(2)s should only block certain flags (an obvious example would be a hypothetical socket(2)-like syscall -- how would you implement pledge(2) for "only allow unix sockets" in user-space without having code that knows about socket(2)?). The last proposal might help solve this if you exposed "enough" metadata, but it feels wrong to me to try to expose a bunch of metadata in the hopes that userspace will be able to make sense of it.
You mean btf...
(I'm only partially trolling btw.)
We don't actually need pledge(2). seccomp(2) could be extended to do this. There's even precedence SECCOMP_SET_MODE_STRICT is restricting you to a very limited set of syscalls. We could extend seccomp(SECCOMP_PLEDGE, 0, "stdio,sendfd,recvfd") and then seccomp would just create a bpf filter or more elaborate for future extensibility :):
struct seccomp_pledge pledge;
seccomp(SECCOMP_PLEDGE, 0, &pledge);
Posted Oct 2, 2019 15:08 UTC (Wed)
by cyphar (subscriber, #110703)
[Link] (5 responses)
I was thinking of BTF while writing it, though I don't know if BTF currently gives us the details we want -- we don't just want lists of functions and structure layouts. We need to have a way for the kernel to tell userspace "this syscall is part of the net/tcp plege-set" or something similar (and probably a way to indicate "if this flag is set then the syscall is (also?) part the foobar pledge-set").
> We could extend seccomp(SECCOMP_PLEDGE, 0, "stdio,sendfd,recvfd") ...
"What's in a name? That which we call [pledge(2)]
#define pledge(list) seccomp(SECCOMP_PLEDGE, 0, list)
But yes, in that case we are in agreement -- let's do it in-kernel (but taking care to be incompatible with OpenBSD, so that we can pretend we came up with the idea :P).
Posted Oct 2, 2019 20:37 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (4 responses)
OpenBSD added the sendsyslog syscall so that processes could be denied socket access but still be able to use the syslog facility. I don't think this could be emulated with seccomp, either, as filtering on the sun.sun_path argument to bind(2) has the same problems as filtering on the path argument to open(2). You could require processes to open the socket before dropping privileges, but what happens if the syslogd daemon restarts?
There are several little pragmatic tweaks like this that make pledge functional. A fundamental hurdle on Linux is that the project is so large and diverse that there's enormous pressure to prevent leaky abstractions that require far flung tweaks across the system. Such tweaks are especially brittle in the Linux development model, and people are wary of unintended consequences. That's completely understandable, but sometimes such tweaks are simply unavoidable if your goal is maximizing userland convenience and security. Irreducible complexity has to be apportioned among userland and various kernel subsystems somehow; OpenBSD tends to apportion it quite differently than Linux, partly because of the different development models.
The irony is that the path of least resistance for Linux has been containers--namespaces, cgroups, etc--which has become precisely the slippery slope of complexity and code churn people feared. (Which is why OpenBSD rejected FreeBSD jails.) Not that containers weren't worth it for their own sake, I just find the path dependency and contradictions interesting.
Posted Oct 2, 2019 23:41 UTC (Wed)
by roc (subscriber, #30627)
[Link]
Posted Oct 3, 2019 6:14 UTC (Thu)
by cyphar (subscriber, #110703)
[Link] (2 responses)
In my view, this is actually a good thing -- pathname filtering based on the string value of the path is (in my view) destined to be a bad idea (I explain this further in [1]). I reckon that the right combination of bind-mounts and AppArmor/SELinux would be a far more effective method for doing this without all of the foot-guns.
> There are several little pragmatic tweaks like this that make pledge functional.
I agree that these sorts of niceties are very useful for making pledge(2) much easier to use for userspace, but I'm not convinced that we need to do all of them in-kernel. There is no reason why we can't also have a libpledge which can help deal with some of the more peculiar userspace bits (that are separate from the core "these syscalls on this kernel form this pledge-group" feature we need to be in-kernel).
Posted Oct 3, 2019 8:51 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
AppArmor has several problems, though. In particular, it can't be effectively used in unprivileged contexts. For example, you can't run a program that you just compiled with a custom policy.
It also was not possible to use AppArmor from inside containers (has this changed?).
Posted Oct 3, 2019 10:48 UTC (Thu)
by jem (subscriber, #24231)
[Link]
PostgreSQL considers seccomp() filters
By any other name would [provide the same functionality];"
PostgreSQL considers seccomp() filters
PostgreSQL considers seccomp() filters
PostgreSQL considers seccomp() filters
PostgreSQL considers seccomp() filters
SELinux is never a solution...
PostgreSQL considers seccomp() filters