Hard disagree

Posted Jan 24, 2025 20:28 UTC (Fri) by intelfx (subscriber, #130118)
In reply to: Hard disagree by ibukanov
Parent article: The trouble with the new uretprobes

> It does not matter if it was the kernel that injected the code. The code runs in the user space and so a malicious code can also try to call the uretprobe syscall.

It does, though. As said above, the implementation handles this case.

> And the implementation may have bugs

Just like any other functionality.

> As a defense in depth one may want to block all that functionality until the code mature and sufficiently tested to add it to the white list.

As said elsewhere by another commenter, if the goal is "defense-in-depth" conservatism, then it's the wrong layer to make these restrictions at. If it's a defense-in-depth mechanism, then it must be handled with a more suitable mechanism, like a separate sysctl or in the same vein as other ptrace-related restrictions.

In other words: if the goal is to protect against "immature not sufficiently tested code", then it's a policy decision that must be taken by the local administrator, not by every single application which **has nothing to do with the syscall being injected**.

Hard disagree

Posted Jan 25, 2025 19:43 UTC (Sat) by ibukanov (subscriber, #3942) [Link] (1 responses)

> In other words: if the goal is to protect against "immature not sufficiently tested code", then it's a policy decision that must be taken by the local administrator, not by every single application

The problem was caused by Docker, not the application code. Configuration of default policy for the Docker is responsibility of administrator or at least the distribution, not applications.

And Docker is absolutely right here. Its policy is about minimizing the attack surface against the kernel.

Hard disagree

Posted Jan 25, 2025 20:12 UTC (Sat) by intelfx (subscriber, #130118) [Link]

> The problem was caused by Docker, not the application code.

Well, that's even worse. That's double "spooky action at a distance".

> Its policy is about minimizing the attack surface against the kernel.

You're making precisely zero sense. It's not Docker's business to accidentally restrict the administrator from injecting tracepoints using unrelated mechanisms into unrelated applications, and it's not Docker's business to enact such policy (even if it was intentional, which it is not, due to lousy architecture all around).

Hard disagree

Posted Jan 26, 2025 1:39 UTC (Sun) by wahern (subscriber, #37304) [Link] (7 responses)

> > And the implementation may have bugs

> Just like any other functionality.

Right. That's the whole point of seccomp--that the software, both userspace *and* kernel, might have exploitable bugs, and that minimizing exposed kernel surface area is no less important (if not *more* important) than in-process mitigations for userspace code. seccomp has very little value as a defensive, mitigation layer if it's not deny by default.

Someone else mentioned that everybody should be focused on capability systems, not seccomp. Well, I don't think many on the seccomp side would disagree that the community should be designing, implementing, and adopting capability systems more strongly. But that takes highly coordinated effort across all layers of the stack that the Linux software ecosystem in particular hasn't been particularly successful at. Afterall, what's the capability story with uretprobe? There's no file descriptor/token involved. How would it even work--the profiler is effectively injecting code in the application. And presuming there was some proper capability system involved, Docker's strategy here is to impose a jail without the cooperation of the application (it's the administrator, via Docker making a policy decision for an application that itself isn't even aware of the mechanism), and in a proper capability system Docker would likely be a broker that would presumably deny by default. Docker exists for the same reason seccomp exists--because administrative tooling is easier for the mainstream to adopt than to coordinate refactoring of application stacks.

There are no easy answers, here. OpenBSD has pledge, a saner, more comprehensive seccomp, and it works very well there. But pledge is premised on the notion that each application is refactored to make proper use of it; it doesn't work much better as a practical matter than seccomp when imposed administratively. FreeBSD has Capsicum, a capability architecture. But FreeBSD has a much more diverse ecosystem, refactoring for Capsicum is a much heavier lift, and it's seen little uptake by non-core software.

Hard disagree

Posted Jan 26, 2025 2:03 UTC (Sun) by intelfx (subscriber, #130118) [Link] (6 responses)

> Right. That's the whole point of seccomp--that the software, both userspace *and* kernel, might have exploitable bugs, and that minimizing exposed kernel surface area is no less important (if not *more* important) than in-process mitigations for userspace code. seccomp has very little value as a defensive, mitigation layer if it's not deny by default.
> <...>
> Docker's strategy here is to impose a jail without the cooperation of the application (it's the administrator, via Docker making a policy decision for an application that itself isn't even aware of the mechanism), and in a proper capability system Docker would likely be a broker that would presumably deny by default. Docker exists for the same reason seccomp exists--because administrative tooling is easier for the mainstream to adopt than to coordinate refactoring of application stacks.

*What* capability would Docker be denying by default, if this was a capability-based system?

There is no (hypothetical) capability that is being used by the target application. I, as the administrator (presumably in possession of root-equivalent privileges), am requesting the kernel to inject some code into the target application on my behalf. Nothing else in the system (not Docker, not the target application) has any business meddling with this request in any way.

Seccomp has about as much reason to block this pseudo-syscall as it has to block, say, a trap instruction. Seccomp doesn't block trap instructions, now does it? They are entry points into the kernel too, after all.

Hard disagree

Posted Jan 26, 2025 2:10 UTC (Sun) by intelfx (subscriber, #130118) [Link] (5 responses)

> I, as the administrator (presumably in possession of root-equivalent privileges), am requesting the kernel to inject some code into the target application on my behalf.

Slight correction: I am requesting the kernel to "do something" to let me trace the application. What this "something" is is an implementation detail of the kernel. This implementation detail has its own protections against being abused. So it makes even less sense that Docker can somehow interfere with this implementation detail.

Like I said: it's as if we had to use seccomp to whitelist internal kernel functions that are being invoked during the course of execution of an (otherwise allowed) syscall, on the grounds that "if the kernel is calling some new functions, those represent untested code paths which we want to deny by default because they are untested and immature".

It makes no sense.

Hard disagree

Posted Jan 26, 2025 8:33 UTC (Sun) by ibukanov (subscriber, #3942) [Link] (1 responses)

The malicious userspace can execute the new syscall even if no system administrator has asked for it. One can argue that the implementation on the kernel side is bulletproof and clearly rejects such attempts, but the past experience is full of cases when this was false.

So secomp is right to reject this case. The trap case is fundamentally different because that code is extremely mature and well-tested allowing secomp to trust that by default.

Hard disagree

Posted Jan 26, 2025 14:40 UTC (Sun) by intelfx (subscriber, #130118) [Link]

> The trap case is fundamentally different because that code is extremely mature and well-tested allowing secomp to trust that by default.

Seccomp never "distrusted" trap instructions. It cannot prevent trap instructions from being executed, never did. It's not because "seccomp trusts that by default", it's because trap instructions are out of scope of seccomp, always were, always would be.

So no, this reasoning is invalid.

Hard disagree

Posted Jan 26, 2025 12:25 UTC (Sun) by glettieri (subscriber, #15705) [Link] (2 responses)

> This implementation detail has its own protections against being abused.

I may be wrong, but I think you are missing a point here. The protection is against calls coming from outside the injected trampoline (or even from the exact location in the trampoline). But an attacker who has hijacked the control flow in the traced application can make it jump into the trampoline and issue a uretprobe syscall that passes the protection check. Therefore, if there are bugs in the uretprobe implementation, the injected trampoline potentially exposes those bugs to the attacker.

Hard disagree

Posted Jan 26, 2025 14:51 UTC (Sun) by intelfx (subscriber, #130118) [Link] (1 responses)

> I think you are missing a point here

I think you are missing mine. How is it different from an application hijacking control flow or whatever to jump to the previous implementation of this mechanism, i.e., a trap instruction? The answer is "it's not", and we were okay with it.

This argument is clearly going in circles, so in order not to incur the wrath of our editors, I will stop participating in this subthread. (However, I must note that this is not equal to conceding.)

Hard disagree

Posted Jan 26, 2025 15:35 UTC (Sun) by glettieri (subscriber, #15705) [Link]

> I think you are missing mine. How is it different from an application hijacking control flow or whatever to jump to the previous implementation of this mechanism, i.e., a trap instruction? The answer is "it's not", and we were okay with it.

Good point, I see what you mean now.