Right, there is the obvious issue that most programs use at least glibc and are thus susceptible to say glibc being upgraded and using a new system call.
The other issue if one does use glibc is what happens when say a different NSS module is in use than the default, and so all of a sudden calling getpwuid_r() involves creating a TCP socket or something?
SELinux at least has high-level macros in the policy source, as well as runtime booleans that map to popular system configurations.
That's not to say seccomp is bad - it clearly makes sense to use in the way Chrome is using it. But generalizing out is much harder.
Posted Jul 17, 2012 17:37 UTC (Tue) by sztanpet (subscriber, #60731)
[Link]
You can also create a blacklist of syscalls, which might be less "bad".
Anyway, the feature is worth it if only for the NoNewPrivileges options, quoting the manual:
"Takes a boolean argument. If true ensures that the service process and all its children can never gain new privileges. This option is more powerful than the respective secure bits flags (see above), as it also prohibits UID changes of any kind. This is the simplest, most effective way to ensure that a process and its children can never elevate privileges again."
Systemd gets seccomp filter support
Posted Jul 17, 2012 21:39 UTC (Tue) by scientes (guest, #83068)
[Link]
And just to make it clear. In order to use seccomp 2 as non-root, you need to first call No New Privs, this is why the seccomp feature implies no new privs, but you can set no new privs off if you really know what you are doing. (are are launching from the main systemd process that is root, and not a user-session systemd)
Systemd gets seccomp filter support
Posted Jul 20, 2012 0:36 UTC (Fri) by luto (subscriber, #39314)
[Link]
You don't need to be root to use PR_SET_NO_NEW_PRIVS.
Note that setting this is likely to defeat any selinux protections on the service (if any) -- until selinux adds some magic restrict-only mode and makes it work with no_new_privs, privilege transitions on exec won't happen.
systemd could get fancy and do the selinux transition itself, I suppose.
Take a look at the shiny docs in Documentation/prctl/no_new_privs.txt
Systemd gets seccomp filter support
Posted Jul 17, 2012 18:25 UTC (Tue) by iabervon (subscriber, #722)
[Link]
It seems to me like it wouldn't be too hard to assemble the filter from a combination of per-program information and per-library information. If systemd knows what your NSS configuration is, it should be able to paste in the appropriate filter. (It's generating the actual BPF from your description anyway, at least in the current implementation, so it could even say 'malloc isn't actually a system call, but I know what you need is brk mmap munmap open("/dev/zero").')
The nice thing about seccomp is that userspace is responsible for providing the policy, and the kernel just enforces it, and userspace can do a lot more analysis than is appropriate for the kernel to do.
Systemd gets seccomp filter support
Posted Jul 17, 2012 18:46 UTC (Tue) by walters (subscriber, #7396)
[Link]
The nice thing about seccomp is that userspace is responsible for providing the policy, and the kernel just enforces it,
That's true of SELinux as well; I assume you're referring to AppArmor here or something. Or unless you're talking about the ability of a userspace program to *dynamically* adjust its filter in response to configuration files or environment, in which case yes it's definitely more flexible (although the proposed systemd syntax doesn't allow run-time mutation).
What would be kind of interesting though is if shared libraries could come with lists of system calls they could possibly make. That way if e.g. your app upgrades from GLib 2.28 to 2.30 (in between a lot of things changed, but e.g. I switched the main loop to use eventfd instead of pipe()), your app wouldn't have to change.
That'd require some integration work at the systemd side to introspect the binary before launching it and determine what shared libraries are used.
Systemd gets seccomp filter support
Posted Jul 17, 2012 19:24 UTC (Tue) by iabervon (subscriber, #722)
[Link]
I'm talking about the parent of a process being able to dynamically adjust the policy for the process right before exec. In SELinux, the policy is written by userspace, but the kernel controls determining the security domain during exec(), and that selects the applicable policy, so there's no userspace involvement at the last minute. Userspace isn't necessarily given a chance to react to changes in NSS configuration between when the configuration last changed and starting new restricted processes.
AFAICT, the systemd syntax doesn't exclude the possibility of listing library functions in your syscall list, and having that trigger run-time mutation. And systemd is obviously constructing BFP based on a combination of your list and stuff it knows, if for no other reason than that it has to figure out syscall numbers from names.
Systemd gets seccomp filter support
Posted Jul 17, 2012 19:36 UTC (Tue) by mezcalero (subscriber, #45103)
[Link]
The sycall filter thingy is not in any way comparable with SELinux, and systemd tightly integrates with SELinux as well.
Please don't think that the syscall filter thingy is intended to replace SELinux in any way. Syscall filtering is hardly comparable to what you can express with SELinux policy. This stuff is useful in a few cases however which SELinux doesn't really cover: it's trivial to write for admins, without the need to get the SELinux policy rebuilt and updated, third party software can easily make of this to lock itself down, and it works fine even in systemd user instances, i.e. to lock down individual user services or apps without any system policy updates.
So, if anybody tries to compare this with SELinux, then you are comparing apples and oranges and assuming that there was competition in something where there is no competition.
Systemd gets seccomp filter support
Posted Jul 17, 2012 19:43 UTC (Tue) by walters (subscriber, #7396)
[Link]
I certainly wasn't saying that one *replaces* the other. However it's *perfectly* valid to compare the *tradeoffs* between them.
You even do it yourself in the second paragraph:
"This stuff is useful in a few cases however which SELinux doesn't really cover: it's trivial to write for admins,"
I think "trivial to write for admins" is less true than you think. And of those issues are the same reasons that writing SELinux policy is hard; version skew of the "app" and the underlying system, delta between tested configuration and deployment, etc.
Systemd gets seccomp filter support
Posted Jul 17, 2012 19:56 UTC (Tue) by mezcalero (subscriber, #45103)
[Link]
Well, I think it's much easier to write syscall filter lists for the simple reason that everybody knows the main tool for doing that: strace. And what's also nice is that it allows you to write blacklists too, which adds a bit of security, and is super duper easy to do:
And that's all yoou need to make sure that your process doesn't get access to any IO port or can change the time.
Systemd gets seccomp filter support
Posted Jul 17, 2012 21:46 UTC (Tue) by jimparis (subscriber, #38647)
[Link]
Until you remember that iopl() also gives access to IO ports, and direct memory access makes it easy enough to change the time. I don't think blacklists can ever realistically work.
Systemd gets seccomp filter support
Posted Jul 18, 2012 2:21 UTC (Wed) by jcm (subscriber, #18262)
[Link]
Just a note here. "Everybody" is "one who is skilled in the art" (of computer programming on Unix and Linux systems). That isn't most sysadmins. It's perhaps most sysadmins I hang out with, but it's not most out there. The idea of sysadmins writing system call filters terrifies me from a support perspective :)
Systemd gets seccomp filter support
Posted Jul 18, 2012 17:55 UTC (Wed) by cmccabe (guest, #60281)
[Link]
Yeah, I thought the whole idea behind seccomp was that developers would add sandboxing to their own programs. Adding it as yet another sysadmin-configurable knob seems like exactly the wrong direction to go.