vDSO, 32-bit time, and seccomp

Posted Aug 3, 2019 6:17 UTC (Sat) by Cyberax (✭ supporter ✭, #52523)
In reply to: vDSO, 32-bit time, and seccomp by NYKevin
Parent article: vDSO, 32-bit time, and seccomp

> A hypothetical crypto library should not need to
> call into the sockets API
Except to set up the kernel-level TLS acceleration. Or it might need to make outgoing connections to validate CRLs, for example.

> create processes
OK.

> manipulate shared memory
Except if it wants to use uring, maybe?

> access the filesystem
> or do a wide variety of other I/O-ish things.
Read CA bundles.

vDSO, 32-bit time, and seccomp

Posted Aug 3, 2019 7:01 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (3 responses)

> Except to set up the kernel-level TLS acceleration.

Sure, if that's the specific thing that you are doing. But then the application logic knows you are doing that, and can avoid sandboxing it.

> Or it might need to make outgoing connections to validate CRLs, for example.

Gods, no. If the application wants to use a CRL, it downloads it separately, and before applying the sandbox. The crypto library could, of course, provide a helper function for that, but it should not be part of the "main" codepath unless the caller has somehow asked for it. You don't make outgoing connections behind the application code's back.

> Read CA bundles.

read(2) poses substantially less of a security risk than write(2) and open(2), so I don't actually have a problem with this.

vDSO, 32-bit time, and seccomp

Posted Aug 3, 2019 9:24 UTC (Sat) by storner (subscriber, #119) [Link] (2 responses)

> > Or it might need to make outgoing connections to validate CRLs, for example.

>Gods, no. If the application wants to use a CRL, it downloads it separately, and before applying the sandbox. The crypto library could, of course, provide a helper >function for that, but it should not be part of the "main" codepath unless the caller has somehow asked for it. You don't make outgoing connections behind the >application code's back.

Gods, no. CRL's from a public CA are huge and the cost (time, bandwidth, storage) of downloading one would be prohibitive in most cases. You normally use OCSP which requires an HTTP(S) network connection. So socket/network access is needed.

vDSO, 32-bit time, and seccomp

Posted Aug 3, 2019 10:56 UTC (Sat) by chris_se (subscriber, #99706) [Link]

> Gods, no. CRL's from a public CA are huge and the cost (time, bandwidth, storage) of downloading one would be prohibitive in most cases. You normally use OCSP which requires an HTTP(S) network connection. So socket/network access is needed.

Although in an ideal word everybody would use OCSP Stapling - that way it wouldn't require the client to do OCSP requests to arbitrary destinations, and only each server would need to perform such a request every two days or so, and that only to its own CA.

vDSO, 32-bit time, and seccomp

Posted Aug 5, 2019 18:20 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

A MitM can cause OCSP requests to fail, at which point most stacks choose fail-open. So OSCP provides no security benefit and should be removed to reduce attack surface and network chatter. Or else you should make it fail-closed, but nobody actually does that.

vDSO, 32-bit time, and seccomp

Posted Aug 4, 2019 20:27 UTC (Sun) by rwmj (subscriber, #5474) [Link] (8 responses)

Filtering on system calls is somewhat ridiculous anyway. The proper way to do this is with capabilities. You are given a ticket which allows certain operations (eg. access a subdirectory in the filesystem), and you can create new tickets which are subsets of those operations that you hand down to the libraries and components you use. Capabilities are supported by the operating system so diagnosing problems and working out what capabilities are needed to carry out a whole task can be done at the level of the whole system.

vDSO, 32-bit time, and seccomp

Posted Aug 4, 2019 21:00 UTC (Sun) by roc (subscriber, #30627) [Link] (7 responses)

I'm all for capabilities but the goal of seccomp is to reduce the attack surface of kernel code that the confined process can trigger execution of, and capabilities aren't always an appropriate way to express that.

For example almost every application needs read(). Most don't need the features provided by preadv2(), and those features trigger execution of a bunch of relatively new and untested kernel code. How would you use capabilities to control the ability of a confined process to access those features?

vDSO, 32-bit time, and seccomp

Posted Aug 4, 2019 21:11 UTC (Sun) by quotemstr (subscriber, #45331) [Link] (4 responses)

preadv2 and other new system calls provide new capabilities. These new capabilities let programs do a better job of serving the user. How are these programs supposed to deliver this improved utility to users if security policy blocks the new system calls?

It's circular: we have to block them because they're rare, and they're rare because we block them. We can't make progress that way.

I'm all for addressing specific known vulnerabilities, but this practice is reflexively blocking anything new has got to stop.

vDSO, 32-bit time, and seccomp

Posted Aug 4, 2019 21:36 UTC (Sun) by roc (subscriber, #30627) [Link] (3 responses)

In practice, security needs vary, seccomp policies vary, and lots of software runs without a seccomp policy at all, so there is no circular deadlock.

Also, many seccomp policies are tailed to the needs of the software they confine, rather than the other way around. Don't tell Chrome or Firefox that they should stop using seccomp policies to sandbox their browser processes because the kernel community needs additional testing of kernel code ... which their browser processes only exercise if they've been compromised.

vDSO, 32-bit time, and seccomp

Posted Aug 5, 2019 0:04 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Chrome actually works just fine with pledge() - http://undeadly.org/cgi?action=article;sid=20160107075227

Raw syscall filtering really is looking like a bad solution.

vDSO, 32-bit time, and seccomp

Posted Aug 5, 2019 0:49 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)

Sure, after modifying pledge() to make it work just fine with Chrome. https://marc.info/?l=openbsd-cvs&m=145207222327683&...

But that has nothing to do with this sub-thread, which is about whether capabilities obviate the need for seccomp.

vDSO, 32-bit time, and seccomp

Posted Aug 5, 2019 3:51 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

I honestly don't mind this approach. Fully generalized systems are not always the best solution.

vDSO, 32-bit time, and seccomp

Posted Aug 5, 2019 14:06 UTC (Mon) by MarcB (subscriber, #101804) [Link] (1 responses)

This raises the question what seccomp is supposed to be.

Should it be some "personal firewall" to protect potentially vulnerable kernel code or should it restrict the functionality available to processes based on their needs (i.e. classical sandboxing)?

Personally, I think only the second concept is feasible. In that approach, there would be no difference whatsover between read() and preadv2() - or clock_gettime64() and clock_gettime(). Those syscalls are equivalent in the sense that they allow a process to do exactly the same things.

If seccomp is used to filter arbitrary syscalls, this will lead to ossifications (can't reliably use new syscalls) and maintenance or portability nightmares (just look at the circumstances needed to trigger this problem here). And frankly, if the Linux kernel really needed such a protective filter, it would be high time to switch operating systems (or to significantly change Linux' development process wrt syscalls).

Applications and administrators should define security in term of the security model provided by the operating system and not start second-guessing it. Doing so would cause the same madness operating system developers are currently experiencing with those hardware vulnerabilities, but on a much larger scale.

vDSO, 32-bit time, and seccomp

Posted Aug 5, 2019 21:48 UTC (Mon) by roc (subscriber, #30627) [Link]

Google developed seccomp-bpf for the Chrome sandbox and "protect potentially vulnerable kernel code" was explicitly a goal. I don't know why you think that isn't feasible; it is feasible, and it's working.

> And frankly, if the Linux kernel really needed such a protective filter,

It does. See https://events.linuxfoundation.org/wp-content/uploads/201...
The situation has not improved.

> it would be high time to switch operating systems (or to significantly change Linux' development process wrt syscalls).

Maybe so but for now seccomp-bpf is needed.