User: Password:
Subscribe / Log in / New account

Re: [PATCH v9 05/13] seccomp_filter: Document what seccomp_filter is and how it works.

From:  Will Drewry <>
To:  Ingo Molnar <>
Subject:  Re: [PATCH v9 05/13] seccomp_filter: Document what seccomp_filter is and how it works.
Date:  Fri, 1 Jul 2011 11:43:41 -0500
Message-ID:  <>
Cc:  James Morris <>, Chris Evans <>,, Linus Torvalds <>,,,,,,, Randy Dunlap <>,, Eric Paris <>,
Archive-link:  Article

On Fri, Jul 1, 2011 at 11:10 AM, Ingo Molnar <> wrote:
> * Will Drewry <> wrote:
>> From my view, ftrace events are not ready for the job yet - and
>> relying purely on available wrapped events may make it unsuitable
>> for attack surface reduction forever.  As is, there is no compat
>> syscall support.  Many syscalls are not wrapped at present and no
>> one ack'd my earlier patches around wrapping more.  All of perf
>> needs to be overhauled to share per-task infrastructure. A new ABI
>> needs to be proposed if my prctl() changes are not acceptable to
>> handle some of the security-focused behavioral requirements.
>> Performance characteristics need to be better analyzed as the
>> current perf list_head approach may not scale as desired.  The list
>> goes on.  My proof of concept patch for "event filters" was just
>> that - a proof of concept.  To truly share the filter events is a
>> large amount of work that may not be viable, and I believe you know
>> that as well as I do.
> But that's exactly my point: i consider it the right way forward
> because it maximizes kernel utility in the long run.

Not if it never happens.  Which is what happened with the proposals
from Adam and from Eric.

> Note that *all* the specific technical items you mention:
>  - wrapping more syscalls (i.e. making syscall tracing
>   feature-complete)
>  - a clean filtering ABI
>  - performance improvements. (Note that this one is already
>   in progress, Thomas has written an IDR implementation that
>   eliminates the list iteration entirely. You could help him
>   finish  it.)

I was thinking specifically about how filter events are stored and
accessed.  But sure, I could try to contribute to any number of
related efforts.

> are not some bad side effect or quirk, they are all generic
> improvements we want in any case and not just for sandboxing.

I didn't say they were bad side effects or quirks.

> You might not be interested in all of those items, you are only
> interested in getting the narrow feature-set you are interested in,
> but you sure are interested in getting sandboxing versus not getting
> anything at all, right?

Unfortunately, that isn't the value proposition for me or many other
contributors.  The real question is whether I am interested in getting
sandboxing in with mainline or if I want to sign up to maintain the
patches out of tree until my hair falls out.

I would much prefer to have a solution that Linux users as a whole can
benefit from and not just a subset of users I affect, but it's not a
hostage situation. I was hoping to work toward a solution that met
needs in the near future while being able to continue to invest in
driving long term changes.  If all the other work is a prerequisite
for system call restriction, I'll be very lucky to see anything this
calendar year assuming I can even write the patches in that time.

> Not doing it right because "it's too much work", especially as the

I'm not averse to work, but I don't necessarily feel that the extra
work is justified.  I also have to deal with my own personal and work
time constraints.

> trivial 'proof of concept' prototype already gave us something very
> promising that worked to a fair degree:
>       bitmask (2009):  6 files changed,  194 insertions(+), 22 deletions(-)
>  filter engine (2010): 18 files changed, 1100 insertions(+), 21 deletions(-)
>  event filters (2011):  5 files changed,   82 insertions(+), 16 deletions(-)
> are pretty hollow arguments to me. That diffstat sums up my argument
> of proper structure pretty well.

I wrote that code so I know how hollow the diffstats are.  The 82
lines of code do not:
- convince perf maintainers to share per-task events with "event filter" code
- provide reduce-privilege-only semantics
- provide a clean ABI that doesn't stomp all over the perf ABI
- provide compat syscalls
- provide a rewrite of DEFINE_SYSCALL* to support non 'long' syscalls
- provide ptreg syscall support
- provide any sort of blocking guarantees for unhooked system call events
- ...

I'd like to be able to move along security for the platform today and
not in two years, but if my only chance of any form of this being
ACK'd is to write it such that it shares code with perf and has a
shiny new ABI, then I'll queue up the work for when I can start trying
to tackle it.

That said, I still feel that this patch series is the right thing to
do now - not just for my personal reasons but for the kernel too.

To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to
More majordomo info at

(Log in to post comments)

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds