LWN.net Logo

Re: [PATCH RFC] syslog ns proof of concept

From:  "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA-AT-public.gmane.org>
To:  "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w-AT-public.gmane.org>
Subject:  Re: [PATCH RFC] syslog ns proof of concept
Date:  Mon, 19 Nov 2012 14:18:15 +0000
Message-ID:  <20121119141815.GB4321@mail.hallyn.com>
Cc:  containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA-AT-public.gmane.org, =?iso-8859-1?Q?St=E9phane?= Graber <stephane.graber-Z7WLFzj8eWMS+FvcfC7Uqw-AT-public.gmane.org>, Daniel Lezcano <dlezcano-GANU6spQydw-AT-public.gmane.org>
Archive-link:  Article, Thread

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
> >> 
> >> > Introduce a system log namespace.  The syslog ns is tied to a user
> >> > namespace.  You must create a new user namespace before you can create a
> >> > new sylog ns.  The syslog ns is created through a new command (11) to
> >> > the __NR_syslog system call.
> >> >
> >> > Once a task enters a new syslog ns, it's "dmesg", "dmesg -c" and
> >> > /dev/kmsg actions affect only itself, so that user-created syslog
> >> > messages no longer are confusingly combined in the host's syslog.
> >> > "printk" itself always goes to the initial syslog_ns, and consoles
> >> > belong only to the initial syslog_ns.  However printks relating to a
> >> > specific network namespace, for instance, can now be targeted to the
> >> > syslog ns for the user ns which owns the network ns, aiding in debugging
> >> > in a container.
> >> >
> >> > This patch is on top of the user namespace enhanced kernel at
> >> > git://kernel.ubuntu.com/serge/quantal-userns.  It is good enough to
> >> > compile with stock ubuntu kernel options, boot, launch other syslog
> >> > namespaces and exercise them.  It will need help before it will compile
> >> > with funky options like CONFIG_PRINTK=n.  This is only being sent out to
> >> > get feedback on the general idea.
> >> >
> >> > Comments greatly appreciated.
> >> >
> >> > (See https://wiki.ubuntu.com/LxcSyslogNs for background).
> >> 
> >> Overall I would say the goal sounds well thought out.
> >> 
> >> I am not a fan of how this ties into the user namespace.  I would prefer
> >> closer or looser ties.  The recursive reference count loop where a
> >> userns refers to a syslogns and that syslogns refers to the same userns
> >> is unpleasant.
> >
> > We could make the nsproxy point to the syslog_ns, but this seemed simpler.
> > Note that the syslog_ns does not need to pin the user_ns, since by design
> > the user_ns owning a syslog_ns can't go away if the syslog_ns is still
> > alive.
> >
> > But yes, the question of "what should point to the syslog_ns" is what has
> > kept a syslog_ns from being seriously proposed since february 2010 :)
> >
> > Hm, wait.  A nagging feeling made me look back, and I see that I do in
> > fact pin the user_ns from the syslog_ns.  I didn't mean to (and I don't
> > release it :)  and we don't need to.  When a syslog_ns is created, it
> > can only be inherited by child user_ns's, and its owner, the parent user_ns,
> > can never go away until the child user_ns's go away.
> 
> There is an argument to be made that syslog messages are the kind of
> security identifiers like uid, gids, and keys that should be part of a
> user namespace.  I'm not fully convinced but there are some DOS attacks
> that would naturally prevent.

I can't really think of a good case for not putting the syslogns straight
into the userns (i.e. not having a separate syslogns), so I'd say let's
go that route.

There is a big locking bug (besides syslog_ns pinning user_ns) in my
patch - something needs to be done with struct cont, which pins the
syslog_ns.  So either when a user_ns is freed we need to flush struct
cont if it is pinning this user_ns, or the struct cont should
explicitly pin the user_ns.

> >> The important case as I understand it is to handle injection of messages
> >> into dmesg by userspace?
> >
> > 1. injection of messages into dmesg by userspace, 2. clearing of messages
> > by userspace, but also 3. allowing appropriate kernel printks to be
> > targeted to containers.
> >
> >> I would really like to see how messages from networking devices and
> >> netfilter would be handled.  Right now one of the ugliest bits of
> >
> > It would simply replace a
> > 	printk(KERN_NOTICE "doing something\n");
> > with
> > 	nsprintk(net->user_ns->syslog_ns, KERN_NOTICE "doing something\n");
> >
> > I'm not yet clear on whether we'd want nsprintk to print to both the
> > init_syslog_ns (with a ns prefix) and the child ns.
> 
> There are some specialized forms of printk like dev_printk and in
> particular netdev_printk that it would be very interesting if they
> did the work behind the scenes.  So that you could code the obvious
> thing and it would do the right thing automatically.

Agreed.

> >> lowering the permissions in the network namespace is what do about the
> >> commands that set the message loglevel.
> >
> > Here I'm not sure what you mean.
> 
> There is a possible DOS attack that by turning on debug messages in a
> user namespace you can overwhelm syslog.

Oh, I see.

> >> In general unless we can safely and sanely direct kernel messages into
> >> this new dmesg I don't actually see the point of having another ring
> >> buffer in the kernel.  If the only success is userspace having the
> >> syslog facility simply be unavailable seems more palatable.
> >
> > No I didn't do any in this patch, but directing kernel messages into the
> > new dmesg was definately a goal and should be trivial now.
> 
> Getting the semantics of which kernel messages should be directed at the
> new ring buffer and what that means seems to me to be a key factor in
> seeing how practical this is.  Otherwise this seems to call out for a
> change in userspace.

Ok, I was hoping that once there was a trivial to use nsprintk the
appopriate users would be converted by others :), but I can take a
look at converting compelling users before I resend.

> Certainly inside a user namespace now you can't destructively touch the
> kernel's syslog at all.

That should be true, yes.

thanks,
-serge


(Log in to post comments)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds