vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
Posted Aug 2, 2019 20:27 UTC (Fri) by nix (subscriber, #2304)In reply to: vDSO, 32-bit time, and seccomp by chris_se
Parent article: vDSO, 32-bit time, and seccomp
Posted Aug 2, 2019 23:19 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link] (30 responses)
Posted Aug 2, 2019 23:33 UTC (Fri)
by mirabilos (subscriber, #84359)
[Link] (28 responses)
I agree, this is ridiculous.
Posted Aug 3, 2019 0:57 UTC (Sat)
by nix (subscriber, #2304)
[Link] (27 responses)
This is ridiculous. It drives a truck through ABI stability guarantees, even guarantees as carefully maintained as (say) glibc's.
Posted Aug 3, 2019 1:06 UTC (Sat)
by quotemstr (subscriber, #45331)
[Link] (20 responses)
Posted Aug 3, 2019 4:38 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (19 responses)
A hypothetical crypto library should not need to call into the sockets API, create processes, manipulate shared memory, access the filesystem, or do a wide variety of other I/O-ish things. A malicious actor trying to exploit a buffer overrun would very much like to do those things, for all manner of reasons, but particularly for key exfiltration. We can reasonably foresee a malicious actor being able to cause such a buffer overrun in a crypto library, because it's actually happened numerous times. Not all of those bugs would have been stopped by seccomp (see for example Heartbleed), but no security measure claims to solve all problems.
At the other extreme, of course a shell is going to call all manner of I/O syscalls (except *maybe* for the sockets API). It really doesn't make sense to try and limit what a shell can do, because the whole point of a shell is to facilitate arbitrary code execution (by the user who is typing commands). Yes, restricted shells exist, but those tend to be sandboxed along different dimensions than "which syscalls are fair game."
Most software is going to fall somewhere between these extremes. So where does that leave us? If I were an upstream, the lesson I would take from this is to just write sensible code, and let downstreams figure out their own security policies. If they file a bug telling me that some of my code is unreasonable, and therefore tripping seccomp, I might fix it. If they file a bug telling me that my code does something that is inconvenient for them, but not unreasonable from where I sit, I would WONTFIX it and let the pieces fall where they may.
Posted Aug 3, 2019 6:17 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (13 responses)
> create processes
> manipulate shared memory
> access the filesystem
Posted Aug 3, 2019 7:01 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Sure, if that's the specific thing that you are doing. But then the application logic knows you are doing that, and can avoid sandboxing it.
> Or it might need to make outgoing connections to validate CRLs, for example.
Gods, no. If the application wants to use a CRL, it downloads it separately, and before applying the sandbox. The crypto library could, of course, provide a helper function for that, but it should not be part of the "main" codepath unless the caller has somehow asked for it. You don't make outgoing connections behind the application code's back.
> Read CA bundles.
read(2) poses substantially less of a security risk than write(2) and open(2), so I don't actually have a problem with this.
Posted Aug 3, 2019 9:24 UTC (Sat)
by storner (subscriber, #119)
[Link] (2 responses)
>Gods, no. If the application wants to use a CRL, it downloads it separately, and before applying the sandbox. The crypto library could, of course, provide a helper >function for that, but it should not be part of the "main" codepath unless the caller has somehow asked for it. You don't make outgoing connections behind the >application code's back.
Gods, no. CRL's from a public CA are huge and the cost (time, bandwidth, storage) of downloading one would be prohibitive in most cases. You normally use OCSP which requires an HTTP(S) network connection. So socket/network access is needed.
Posted Aug 3, 2019 10:56 UTC (Sat)
by chris_se (subscriber, #99706)
[Link]
Although in an ideal word everybody would use OCSP Stapling - that way it wouldn't require the client to do OCSP requests to arbitrary destinations, and only each server would need to perform such a request every two days or so, and that only to its own CA.
Posted Aug 5, 2019 18:20 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
Posted Aug 4, 2019 20:27 UTC (Sun)
by rwmj (subscriber, #5474)
[Link] (8 responses)
Posted Aug 4, 2019 21:00 UTC (Sun)
by roc (subscriber, #30627)
[Link] (7 responses)
For example almost every application needs read(). Most don't need the features provided by preadv2(), and those features trigger execution of a bunch of relatively new and untested kernel code. How would you use capabilities to control the ability of a confined process to access those features?
Posted Aug 4, 2019 21:11 UTC (Sun)
by quotemstr (subscriber, #45331)
[Link] (4 responses)
It's circular: we have to block them because they're rare, and they're rare because we block them. We can't make progress that way.
I'm all for addressing specific known vulnerabilities, but this practice is reflexively blocking anything new has got to stop.
Posted Aug 4, 2019 21:36 UTC (Sun)
by roc (subscriber, #30627)
[Link] (3 responses)
Also, many seccomp policies are tailed to the needs of the software they confine, rather than the other way around. Don't tell Chrome or Firefox that they should stop using seccomp policies to sandbox their browser processes because the kernel community needs additional testing of kernel code ... which their browser processes only exercise if they've been compromised.
Posted Aug 5, 2019 0:04 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Raw syscall filtering really is looking like a bad solution.
Posted Aug 5, 2019 0:49 UTC (Mon)
by roc (subscriber, #30627)
[Link] (1 responses)
But that has nothing to do with this sub-thread, which is about whether capabilities obviate the need for seccomp.
Posted Aug 5, 2019 3:51 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 5, 2019 14:06 UTC (Mon)
by MarcB (subscriber, #101804)
[Link] (1 responses)
Should it be some "personal firewall" to protect potentially vulnerable kernel code or should it restrict the functionality available to processes based on their needs (i.e. classical sandboxing)?
Personally, I think only the second concept is feasible. In that approach, there would be no difference whatsover between read() and preadv2() - or clock_gettime64() and clock_gettime(). Those syscalls are equivalent in the sense that they allow a process to do exactly the same things.
If seccomp is used to filter arbitrary syscalls, this will lead to ossifications (can't reliably use new syscalls) and maintenance or portability nightmares (just look at the circumstances needed to trigger this problem here). And frankly, if the Linux kernel really needed such a protective filter, it would be high time to switch operating systems (or to significantly change Linux' development process wrt syscalls).
Applications and administrators should define security in term of the security model provided by the operating system and not start second-guessing it. Doing so would cause the same madness operating system developers are currently experiencing with those hardware vulnerabilities, but on a much larger scale.
Posted Aug 5, 2019 21:48 UTC (Mon)
by roc (subscriber, #30627)
[Link]
> And frankly, if the Linux kernel really needed such a protective filter,
It does. See https://events.linuxfoundation.org/wp-content/uploads/201...
> it would be high time to switch operating systems (or to significantly change Linux' development process wrt syscalls).
Maybe so but for now seccomp-bpf is needed.
Posted Aug 3, 2019 18:22 UTC (Sat)
by dullfire (guest, #111432)
[Link] (4 responses)
A crypto lib, in a program that can not do any of those things is kind of useless. (or alternately, last I check seccomp applies to processes not shared libs)
Posted Aug 3, 2019 19:51 UTC (Sat)
by mirabilos (subscriber, #84359)
[Link]
Posted Aug 5, 2019 13:09 UTC (Mon)
by leromarinvit (subscriber, #56850)
[Link] (2 responses)
Posted Aug 5, 2019 13:27 UTC (Mon)
by dullfire (guest, #111432)
[Link] (1 responses)
Posted Aug 5, 2019 15:59 UTC (Mon)
by nybble41 (subscriber, #55106)
[Link]
Posted Aug 3, 2019 1:15 UTC (Sat)
by mirabilos (subscriber, #84359)
[Link] (4 responses)
In contrast to the freedesktop.org/systemd/GNOME people and, apparently, Google, I care for more than just GNU/Linux/{amd,arm}64.
Posted Aug 3, 2019 15:01 UTC (Sat)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Aug 3, 2019 16:17 UTC (Sat)
by nivedita76 (subscriber, #121790)
[Link]
Posted Aug 5, 2019 16:32 UTC (Mon)
by josh (subscriber, #17465)
[Link] (1 responses)
Also, I'd be curious what problems you've observed with the access system call on various operating systems.
Posted Aug 22, 2019 22:13 UTC (Thu)
by mirabilos (subscriber, #84359)
[Link]
the shell uses stat and looks at the various bits (mtime, mode, …) for tests.
The condition “read-only filesystem” is not in the scope of the tests (it’s more of a run-time vs. how-the-fs-tree-is-set-up question) and EROFS will be thrown on actual accesses by the kernel.
Most tests are very low-level:
-g file file's mode has the setgid bit set.
Others aren’t, but…
-w file file exists and is writable.
… considering this is a Unix shell, the Unix file attributes are checked, no extended ones, and I know of no portable way to check for them. (That being said, I do not deal with extended attributes at all, and mksh is normally developed on MirBSD which doesn’t have them anyway, but I understand at least OS/2 and Cygwin/Interix/UWIN/PW32 out of the supported platforms do, if HPFS/NTFS is the underlying filesystem; I’m not familiar enough with these.)
I’d have to look why access(2) is not normally used. If it’s only false negatives, we could check _both_ access and stat, and if one fails return a failure. This would be dead slow on most operating systems, so I’d only enable it for those that really need it.
I do know that access(2) says the file is executable if the caller is root and the file isn’t. There’s already an access wrapper in the code, and another one for OS/2 (that deals with adding .exe automatically if needed)…
Posted Aug 4, 2019 22:37 UTC (Sun)
by marcH (subscriber, #57642)
[Link]
> the generic vDSO implementation naturally used clock_gettime64() as the fallback timekeeping system call on all architectures.
> During the 5.3 merge window, the x86 architecture switched over to the generic version,
If the version of clock_gettime() invoked was really the *internal* implementation detail it seemed to be, there wouldn't have been any issue. Just like firewalls, the seccomp approach doesn't seem to care about layers and abstractions. This basically "promotes" internal implementation details to API rank, right? What could possibly go wrong.
> Even if the kernel community avoids incompatible changes, a change in a library somewhere can invoke a new system call that a given seccomp() policy may frown upon.
Sounds like a "yes".
Posted Aug 4, 2019 21:04 UTC (Sun)
by roc (subscriber, #30627)
[Link]
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
> call into the sockets API
Except to set up the kernel-level TLS acceleration. Or it might need to make outgoing connections to validate CRLs, for example.
OK.
Except if it wants to use uring, maybe?
> or do a wide variety of other I/O-ish things.
Read CA bundles.
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
The situation has not improved.
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp
a tangent (was vDSO, 32-bit time, and seccomp)
vDSO, 32-bit time, and seccomp
vDSO, 32-bit time, and seccomp