LWN.net Logo

Sysctl: clean the code and prepare for secure use in containers

From:  Pavel Emelyanov <xemul@openvz.org>
To:  Andrew Morton <akpm@linux-foundation.org>
Subject:  [PATCH 0/3]Sysctl: clean the code and prepare for secure use in containers
Date:  Wed, 27 Feb 2008 16:47:10 +0300
Message-ID:  <47C569DE.9060208@openvz.org>
Cc:  Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Archive-link:  Article, Thread

Many (most of) sysctls do not have a per-container sense. E.g. 
kernel.print_fatal_signals, vm.panic_on_oom, net.core.netdev_budget 
and so on and so forth. Besides, tuning then from inside a container
is not even secure. On the other hand, hiding them completely from
the container's tasks sometimes causes user-space to stop working.

When developing net sysctl, the common practice was to duplicate
a table and drop the write bits in table->mode, but this approach
was not very elegant, lead to excessive memory consumption and
was not suitable in general.

Here's the alternative solution. To facilitate the per-container
sysctls ctl_table_root-s were introduced. Each root contains a 
list of ctl_table_header-s that are visible to different namespaces. 
The idea of this set is to add the permissions() callback on the 
ctl_table_root to allow ctl root limit permissions to the same 
ctl_table-s.

The main user of this functionality is the net-namespaces code, 
but later this will (should) be used by more and more namespaces,
containers and control groups.

Actually, this idea's core is in a single hunk in the third patch.
First two patches are cleanups for sysctl code, while the third
one mostly extends the arguments set of some sysctl functions.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.