User: Password:
|
|
Subscribe / Log in / New account

Systemd gets seccomp filter support

Systemd gets seccomp filter support

Posted Jul 17, 2012 16:38 UTC (Tue) by hmh (subscriber, #3838)
Parent article: Systemd gets seccomp filter support

Please correct me if I'm wrong, but knowing exactly which syscalls to filter to properly sandbox a process is anything but "simple".

AFAIK, it might not be a static target, either. Although it certainly shouldn't be a fast-moving one...


(Log in to post comments)

Systemd gets seccomp filter support

Posted Jul 17, 2012 17:02 UTC (Tue) by walters (subscriber, #7396) [Link]

Right, there is the obvious issue that most programs use at least glibc and are thus susceptible to say glibc being upgraded and using a new system call.

The other issue if one does use glibc is what happens when say a different NSS module is in use than the default, and so all of a sudden calling getpwuid_r() involves creating a TCP socket or something?

SELinux at least has high-level macros in the policy source, as well as runtime booleans that map to popular system configurations.

That's not to say seccomp is bad - it clearly makes sense to use in the way Chrome is using it. But generalizing out is much harder.

Systemd gets seccomp filter support

Posted Jul 17, 2012 17:37 UTC (Tue) by sztanpet (subscriber, #60731) [Link]

You can also create a blacklist of syscalls, which might be less "bad".
Anyway, the feature is worth it if only for the NoNewPrivileges options, quoting the manual:

"Takes a boolean argument. If true ensures that the service process and all its children can never gain new privileges. This option is more powerful than the respective secure bits flags (see above), as it also prohibits UID changes of any kind. This is the simplest, most effective way to ensure that a process and its children can never elevate privileges again."

Systemd gets seccomp filter support

Posted Jul 17, 2012 21:39 UTC (Tue) by scientes (guest, #83068) [Link]

And just to make it clear. In order to use seccomp 2 as non-root, you need to first call No New Privs, this is why the seccomp feature implies no new privs, but you can set no new privs off if you really know what you are doing. (are are launching from the main systemd process that is root, and not a user-session systemd)

Systemd gets seccomp filter support

Posted Jul 20, 2012 0:36 UTC (Fri) by luto (subscriber, #39314) [Link]

You don't need to be root to use PR_SET_NO_NEW_PRIVS.

Note that setting this is likely to defeat any selinux protections on the service (if any) -- until selinux adds some magic restrict-only mode and makes it work with no_new_privs, privilege transitions on exec won't happen.

systemd could get fancy and do the selinux transition itself, I suppose.

Take a look at the shiny docs in Documentation/prctl/no_new_privs.txt

Systemd gets seccomp filter support

Posted Jul 17, 2012 18:25 UTC (Tue) by iabervon (subscriber, #722) [Link]

It seems to me like it wouldn't be too hard to assemble the filter from a combination of per-program information and per-library information. If systemd knows what your NSS configuration is, it should be able to paste in the appropriate filter. (It's generating the actual BPF from your description anyway, at least in the current implementation, so it could even say 'malloc isn't actually a system call, but I know what you need is brk mmap munmap open("/dev/zero").')

The nice thing about seccomp is that userspace is responsible for providing the policy, and the kernel just enforces it, and userspace can do a lot more analysis than is appropriate for the kernel to do.

Systemd gets seccomp filter support

Posted Jul 17, 2012 18:46 UTC (Tue) by walters (subscriber, #7396) [Link]

The nice thing about seccomp is that userspace is responsible for providing the policy, and the kernel just enforces it,
That's true of SELinux as well; I assume you're referring to AppArmor here or something. Or unless you're talking about the ability of a userspace program to *dynamically* adjust its filter in response to configuration files or environment, in which case yes it's definitely more flexible (although the proposed systemd syntax doesn't allow run-time mutation). What would be kind of interesting though is if shared libraries could come with lists of system calls they could possibly make. That way if e.g. your app upgrades from GLib 2.28 to 2.30 (in between a lot of things changed, but e.g. I switched the main loop to use eventfd instead of pipe()), your app wouldn't have to change. That'd require some integration work at the systemd side to introspect the binary before launching it and determine what shared libraries are used.

Systemd gets seccomp filter support

Posted Jul 17, 2012 19:24 UTC (Tue) by iabervon (subscriber, #722) [Link]

I'm talking about the parent of a process being able to dynamically adjust the policy for the process right before exec. In SELinux, the policy is written by userspace, but the kernel controls determining the security domain during exec(), and that selects the applicable policy, so there's no userspace involvement at the last minute. Userspace isn't necessarily given a chance to react to changes in NSS configuration between when the configuration last changed and starting new restricted processes.

AFAICT, the systemd syntax doesn't exclude the possibility of listing library functions in your syscall list, and having that trigger run-time mutation. And systemd is obviously constructing BFP based on a combination of your list and stuff it knows, if for no other reason than that it has to figure out syscall numbers from names.

Systemd gets seccomp filter support

Posted Jul 17, 2012 19:36 UTC (Tue) by mezcalero (subscriber, #45103) [Link]

The sycall filter thingy is not in any way comparable with SELinux, and systemd tightly integrates with SELinux as well.

Please don't think that the syscall filter thingy is intended to replace SELinux in any way. Syscall filtering is hardly comparable to what you can express with SELinux policy. This stuff is useful in a few cases however which SELinux doesn't really cover: it's trivial to write for admins, without the need to get the SELinux policy rebuilt and updated, third party software can easily make of this to lock itself down, and it works fine even in systemd user instances, i.e. to lock down individual user services or apps without any system policy updates.

So, if anybody tries to compare this with SELinux, then you are comparing apples and oranges and assuming that there was competition in something where there is no competition.

Systemd gets seccomp filter support

Posted Jul 17, 2012 19:43 UTC (Tue) by walters (subscriber, #7396) [Link]

I certainly wasn't saying that one *replaces* the other. However it's *perfectly* valid to compare the *tradeoffs* between them.

You even do it yourself in the second paragraph:

"This stuff is useful in a few cases however which SELinux doesn't really cover: it's trivial to write for admins,"

I think "trivial to write for admins" is less true than you think. And of those issues are the same reasons that writing SELinux policy is hard; version skew of the "app" and the underlying system, delta between tested configuration and deployment, etc.

Systemd gets seccomp filter support

Posted Jul 17, 2012 19:56 UTC (Tue) by mezcalero (subscriber, #45103) [Link]

Well, I think it's much easier to write syscall filter lists for the simple reason that everybody knows the main tool for doing that: strace. And what's also nice is that it allows you to write blacklists too, which adds a bit of security, and is super duper easy to do:

SystemCallFilter=~ioperm settimeofday clock_settime

And that's all yoou need to make sure that your process doesn't get access to any IO port or can change the time.

Systemd gets seccomp filter support

Posted Jul 17, 2012 21:46 UTC (Tue) by jimparis (subscriber, #38647) [Link]

Until you remember that iopl() also gives access to IO ports, and direct memory access makes it easy enough to change the time. I don't think blacklists can ever realistically work.

Systemd gets seccomp filter support

Posted Jul 18, 2012 2:21 UTC (Wed) by jcm (subscriber, #18262) [Link]

Just a note here. "Everybody" is "one who is skilled in the art" (of computer programming on Unix and Linux systems). That isn't most sysadmins. It's perhaps most sysadmins I hang out with, but it's not most out there. The idea of sysadmins writing system call filters terrifies me from a support perspective :)

Systemd gets seccomp filter support

Posted Jul 18, 2012 17:55 UTC (Wed) by cmccabe (guest, #60281) [Link]

Yeah, I thought the whole idea behind seccomp was that developers would add sandboxing to their own programs. Adding it as yet another sysadmin-configurable knob seems like exactly the wrong direction to go.

Systemd gets seccomp filter support

Posted Jul 17, 2012 18:53 UTC (Tue) by Ben_P (guest, #74247) [Link]

This seems like a good fit for strace. Maybe run some tests or benchmarks to get your whitelist, then refine from there?

Systemd gets seccomp filter support

Posted Jul 17, 2012 19:27 UTC (Tue) by mezcalero (subscriber, #45103) [Link]

Knowing the syscalls to whitelist is really easy, as strace shows you exactly that.

Systemd gets seccomp filter support

Posted Jul 17, 2012 19:32 UTC (Tue) by dlang (subscriber, #313) [Link]

only if you are sure that you exercise every possible code path while running under strace. otherwise you run the risk of working most of the time, but failing sometimes.

Systemd gets seccomp filter support

Posted Jul 17, 2012 19:36 UTC (Tue) by felixfix (subscriber, #242) [Link]

What if you upgrade a package which introduces new syscalls, then reboot? Bingo, boot fails. That's pretty abrupt.

Systemd gets seccomp filter support

Posted Jul 17, 2012 19:43 UTC (Tue) by mezcalero (subscriber, #45103) [Link]

Almost no service in a systemd install actually causes the boot to fail. Basically only file system mounts can do that, and very little else.

But in general this discussion is really pointless. If you write a syscall filter list, an SELinux policy, a capabilities list, or an apparmor policy: they all have in common that you need a good idea what a specific program is allowed to do and what not. So syscall filter lists have the same "problem" as any other security technology, there is nothing new in this.

Note however that of all these techs listed above writing a syscall filter list is probably by far the easiest though since most admins probably played around with the tool for that at least once in their life: strace.

Systemd gets seccomp filter support

Posted Jul 18, 2012 19:50 UTC (Wed) by lindi (subscriber, #53135) [Link]

strace is not ideal for passively collecting syscall usage statistics of the whole system. I personally use the following systemtap snippet:
#!/usr/bin/stap
global syscall_usage;

probe syscall.* {
    syscall_usage[execname(), probefunc()]++;
}
probe timer.ms(1000) {
    printf("==== syscall usage statistics\n");
    foreach ([e, s] in syscall_usage-) {
        printf("%s %s %d\n", e, s, syscall_usage[e, s]);
    }
}
Example output from a debian wheezy xen instance:
==== syscall usage statistics
sshd sys_rt_sigprocmask 1032
sshd sys_select 516
sshd sys_read 258
sshd sys_write 258
watchdog sys_write 98
stapio sys_read 43
stapio sys_ppoll 35
watchdog sys_open 14
watchdog sys_close 14
watchdog sys_lseek 14
watchdog sys_read 14
watchdog sys_nanosleep 14
ntpd sys_select 9
ntpd sys_ioctl 8
ntpd sys_clock_gettime 8
ntpd sys_rt_sigreturn 7
stapio sys_write 6
stapio sys_fcntl 4
ntpd sys_read 3
ntpd sys_close 2
init sys_newstat 2
stapio sys_pselect6 1
ntpd sys_socket 1
ntpd sys_open 1
ntpd sys_newfstat 1
ntpd sys_mmap_pgoff 1
ntpd sys_lseek 1
ntpd sys_munmap 1
init sys_time 1
init sys_newfstat 1
init sys_select 1

Systemd gets seccomp filter support

Posted Jul 18, 2012 9:08 UTC (Wed) by renox (subscriber, #23785) [Link]

Blacklist are always tricky to get right, whether it's in a sandbox, a firewall or whatever.

I think that it's better to take advantage of the sources to get the list of syscall used..

Systemd gets seccomp filter support

Posted Jul 18, 2012 9:49 UTC (Wed) by anselm (subscriber, #2796) [Link]

The problem here is that the sources for a program don't actually tell you the exact syscalls it uses. What you can see is the C library calls, which may or may not be mapped one-to-one to actual syscalls into the kernel that seccomp could intercept.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds