Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
BPF will have terrible performance, severe limitations and require some sort of ad-hoc compiler.
Simply allowing kernel modules to filter syscalls is a far simpler and faster approach.
Compilation is not a problem because systems like DKMS can automatically compile kernel modules as needed, and Chromium/Firefox/whatever can just install a DKMS module package.
Not to mention that having a proper unified security and mitigation model would be even better.
If this isn't yet in the mainline kernel, I hope Linus never accepts this.
Cook: seccomp filter now in Ubuntu
Posted Mar 26, 2012 16:47 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
Using BPF to filter syscalls is a stroke of genius. BPF is already used in heavy-duty network filtering code (hey, do you think that iptables are slow?) and it has a simple JIT to work even faster.
Besides, in typical seccomp configurations you won't get a lot of syscalls.
Posted Mar 26, 2012 16:59 UTC (Mon) by slashdot (guest, #22014)
DKMS is for those building their own kernels.
> For example, I want to create a sandbox and allow it to access '/home/myname/workarea'. How would you do it?
Use filesystem namespaces or chroot, not syscall filtering (that's under "proper unified security model").
Syscall filtering should be used ONLY to mitigate potential kernel bugs by reducing the attack surface, and certainly not for providing security.
Posted Mar 26, 2012 17:02 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
Both require root access to set up. Not acceptable.
Besides, chroot would require a lot of "mount --bind" magic.
>Syscall filtering should be used ONLY to mitigate potential kernel bugs by reducing the attack surface, and certainly not for providing security.
Why? Syscalls are the primary method of talking with the kernel. It's kinda logical to add filtering there, at the topmost level.
Posted Mar 26, 2012 17:08 UTC (Mon) by slashdot (guest, #22014)
Not at all, because syscalls don't directly map to operations to secure.
For example, filtering access to a path requires hooking dozen of syscalls, requires to reconstruct paths in syscalls such as openat(), handle ioctls that might take in paths, and so on.
Of course, then there are race conditions if you just filter, and you actually need to "reissue" syscalls somehow, at tremendous performance cost.
There's simply no way to do it properly, and that's why Linux provides the LSM interface to do that.
Using system call filtering to provide security (and not just mitigation of bugs) is not just insane, it's totally broken.
Posted Mar 26, 2012 17:15 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
Seccomp is not going to be used to sandbox arbitrary executables. It's used to sandbox *your* *own* code, where you *know* which access patterns you'll need to support.
Posted Mar 26, 2012 22:16 UTC (Mon) by aliguori (subscriber, #30636)
There's really only two sane ways to use syscall filtering: as a slightly more powerful mode 1 where you allow a few more obviously safe calls, like select(), or as a mechanism to reduce the kernel's attack surface.
Posted Mar 26, 2012 23:17 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
Posted Mar 26, 2012 23:24 UTC (Mon) by dpquigl (subscriber, #52852)
Posted Mar 26, 2012 23:26 UTC (Mon) by dpquigl (subscriber, #52852)
Posted Mar 27, 2012 11:26 UTC (Tue) by Da_Blitz (guest, #50583)
the pdf exploits the fact that the syscall wrapper has to perform some policy work before copying the data and performing the syscall and relies on another thread to change the data behind the syscalls back after it has performed the check but before the syscall is executed
by doing it in the kernel side i am assuming that things cant be changed as the values are passed in the registers on most platforms and the BPF checks only check the values of the syscall and not any mem they may point to in the case of pointer
so safe due to being limited in scope (corse grain syscall blocking, ie specific syscalls and perhaps an arg or two), section 8.3 also indicates that this attack can be mitigated by using an in kernel system
Posted Mar 26, 2012 19:56 UTC (Mon) by aliguori (subscriber, #30636)
I agree that syscall filtering is strictly to reduce the kernel's attack surface. Access control should be done via an LSM module like SELinux.
Posted Mar 26, 2012 19:16 UTC (Mon) by scientes (guest, #83068)
Posted Mar 26, 2012 19:50 UTC (Mon) by slashdot (guest, #22014)
Also, BPF programs need to be heavily restricted for security reasons, while kernel modules can look at kernel data structures, keep state, allocate memory, use lookup tables, etc.
Of course, for tcpdump and similar, the need to allow unprivileged users to input arbitrary expressions and instantly get results trumps all other considerations, but sandboxing doesn't have this need.
Posted Mar 26, 2012 20:12 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
And BPF programs _by_ _design_ can not be used to attack the kernel. Simply because they don't allow arbitrary expressions, only a safe verifiable subset.
Posted Mar 27, 2012 2:21 UTC (Tue) by kevinm (guest, #69913)
The idea would be for the original author of the application to write the BPF code, not the system administrator.
Posted Mar 27, 2012 12:01 UTC (Tue) by slashdot (guest, #22014)
Yes, you lose the ability for the unprivileged user to install random syscall filters, but does it matter?
Posted Mar 27, 2012 14:51 UTC (Tue) by renox (subscriber, #23785)
Yes, it matter if installing an application implies installing a kernel module.
Posted Mar 28, 2012 17:51 UTC (Wed) by nix (subscriber, #2304)
Posted Mar 28, 2012 19:14 UTC (Wed) by dpquigl (subscriber, #52852)
Posted Mar 28, 2012 19:58 UTC (Wed) by Cyberax (✭ supporter ✭, #52523)
The _parent_ process can start children with arbitrary filters. Children can't override filters (in fact, they are _forced_ to have NNP flag set).
Posted Mar 28, 2012 21:34 UTC (Wed) by dpquigl (subscriber, #52852)
Posted Mar 28, 2012 23:51 UTC (Wed) by khc (subscriber, #45209)
The assumption is the child process is the one that's loading untrusted data, and so is more likely to be exploitable.
Posted Mar 29, 2012 0:12 UTC (Thu) by Cyberax (✭ supporter ✭, #52523)
NNP flag is a prerequisite for BPF filtering to avoid repeating the infamous Sendmail bug.
Posted Mar 27, 2012 7:40 UTC (Tue) by rvfh (subscriber, #31018)
I wish you would stop behaving like this, and stop thinking that other people are stupid and you are so much smarter.
Posted Mar 27, 2012 7:55 UTC (Tue) by gowen (guest, #23914)
Posted Mar 27, 2012 8:45 UTC (Tue) by robert_s (subscriber, #42402)
Posted Mar 27, 2012 13:27 UTC (Tue) by Jannes (subscriber, #80396)
I hope we don't need a user filter or moderation system.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds