Hard disagree
Hard disagree
Posted Jan 24, 2025 20:28 UTC (Fri) by intelfx (subscriber, #130118)In reply to: Hard disagree by ibukanov
Parent article: The trouble with the new uretprobes
It does, though. As said above, the implementation handles this case.
> And the implementation may have bugs
Just like any other functionality.
> As a defense in depth one may want to block all that functionality until the code mature and sufficiently tested to add it to the white list.
As said elsewhere by another commenter, if the goal is "defense-in-depth" conservatism, then it's the wrong layer to make these restrictions at. If it's a defense-in-depth mechanism, then it must be handled with a more suitable mechanism, like a separate sysctl or in the same vein as other ptrace-related restrictions.
In other words: if the goal is to protect against "immature not sufficiently tested code", then it's a policy decision that must be taken by the local administrator, not by every single application which **has nothing to do with the syscall being injected**.
Posted Jan 25, 2025 19:43 UTC (Sat)
by ibukanov (subscriber, #3942)
[Link] (1 responses)
The problem was caused by Docker, not the application code. Configuration of default policy for the Docker is responsibility of administrator or at least the distribution, not applications.
And Docker is absolutely right here. Its policy is about minimizing the attack surface against the kernel.
Posted Jan 25, 2025 20:12 UTC (Sat)
by intelfx (subscriber, #130118)
[Link]
Well, that's even worse. That's double "spooky action at a distance".
> Its policy is about minimizing the attack surface against the kernel.
You're making precisely zero sense. It's not Docker's business to accidentally restrict the administrator from injecting tracepoints using unrelated mechanisms into unrelated applications, and it's not Docker's business to enact such policy (even if it was intentional, which it is not, due to lousy architecture all around).
Posted Jan 26, 2025 1:39 UTC (Sun)
by wahern (subscriber, #37304)
[Link] (7 responses)
> Just like any other functionality.
Right. That's the whole point of seccomp--that the software, both userspace *and* kernel, might have exploitable bugs, and that minimizing exposed kernel surface area is no less important (if not *more* important) than in-process mitigations for userspace code. seccomp has very little value as a defensive, mitigation layer if it's not deny by default.
Someone else mentioned that everybody should be focused on capability systems, not seccomp. Well, I don't think many on the seccomp side would disagree that the community should be designing, implementing, and adopting capability systems more strongly. But that takes highly coordinated effort across all layers of the stack that the Linux software ecosystem in particular hasn't been particularly successful at. Afterall, what's the capability story with uretprobe? There's no file descriptor/token involved. How would it even work--the profiler is effectively injecting code in the application. And presuming there was some proper capability system involved, Docker's strategy here is to impose a jail without the cooperation of the application (it's the administrator, via Docker making a policy decision for an application that itself isn't even aware of the mechanism), and in a proper capability system Docker would likely be a broker that would presumably deny by default. Docker exists for the same reason seccomp exists--because administrative tooling is easier for the mainstream to adopt than to coordinate refactoring of application stacks.
There are no easy answers, here. OpenBSD has pledge, a saner, more comprehensive seccomp, and it works very well there. But pledge is premised on the notion that each application is refactored to make proper use of it; it doesn't work much better as a practical matter than seccomp when imposed administratively. FreeBSD has Capsicum, a capability architecture. But FreeBSD has a much more diverse ecosystem, refactoring for Capsicum is a much heavier lift, and it's seen little uptake by non-core software.
Posted Jan 26, 2025 2:03 UTC (Sun)
by intelfx (subscriber, #130118)
[Link] (6 responses)
*What* capability would Docker be denying by default, if this was a capability-based system?
There is no (hypothetical) capability that is being used by the target application. I, as the administrator (presumably in possession of root-equivalent privileges), am requesting the kernel to inject some code into the target application on my behalf. Nothing else in the system (not Docker, not the target application) has any business meddling with this request in any way.
Seccomp has about as much reason to block this pseudo-syscall as it has to block, say, a trap instruction. Seccomp doesn't block trap instructions, now does it? They are entry points into the kernel too, after all.
Posted Jan 26, 2025 2:10 UTC (Sun)
by intelfx (subscriber, #130118)
[Link] (5 responses)
Slight correction: I am requesting the kernel to "do something" to let me trace the application. What this "something" is is an implementation detail of the kernel. This implementation detail has its own protections against being abused. So it makes even less sense that Docker can somehow interfere with this implementation detail.
Like I said: it's as if we had to use seccomp to whitelist internal kernel functions that are being invoked during the course of execution of an (otherwise allowed) syscall, on the grounds that "if the kernel is calling some new functions, those represent untested code paths which we want to deny by default because they are untested and immature".
It makes no sense.
Posted Jan 26, 2025 8:33 UTC (Sun)
by ibukanov (subscriber, #3942)
[Link] (1 responses)
So secomp is right to reject this case. The trap case is fundamentally different because that code is extremely mature and well-tested allowing secomp to trust that by default.
Posted Jan 26, 2025 14:40 UTC (Sun)
by intelfx (subscriber, #130118)
[Link]
Seccomp never "distrusted" trap instructions. It cannot prevent trap instructions from being executed, never did. It's not because "seccomp trusts that by default", it's because trap instructions are out of scope of seccomp, always were, always would be.
So no, this reasoning is invalid.
Posted Jan 26, 2025 12:25 UTC (Sun)
by glettieri (subscriber, #15705)
[Link] (2 responses)
I may be wrong, but I think you are missing a point here. The protection is against calls coming from outside the injected trampoline (or even from the exact location in the trampoline). But an attacker who has hijacked the control flow in the traced application can make it jump into the trampoline and issue a uretprobe syscall that passes the protection check. Therefore, if there are bugs in the uretprobe implementation, the injected trampoline potentially exposes those bugs to the attacker.
Posted Jan 26, 2025 14:51 UTC (Sun)
by intelfx (subscriber, #130118)
[Link] (1 responses)
I think you are missing mine. How is it different from an application hijacking control flow or whatever to jump to the previous implementation of this mechanism, i.e., a trap instruction? The answer is "it's not", and we were okay with it.
This argument is clearly going in circles, so in order not to incur the wrath of our editors, I will stop participating in this subthread. (However, I must note that this is not equal to conceding.)
Posted Jan 26, 2025 15:35 UTC (Sun)
by glettieri (subscriber, #15705)
[Link]
Good point, I see what you mean now.
Hard disagree
Hard disagree
Hard disagree
Hard disagree
> <...>
> Docker's strategy here is to impose a jail without the cooperation of the application (it's the administrator, via Docker making a policy decision for an application that itself isn't even aware of the mechanism), and in a proper capability system Docker would likely be a broker that would presumably deny by default. Docker exists for the same reason seccomp exists--because administrative tooling is easier for the mainstream to adopt than to coordinate refactoring of application stacks.
Hard disagree
Hard disagree
Hard disagree
Hard disagree
Hard disagree
Hard disagree
