LWN.net Logo

unshare: new system call

From:  Janak Desai <janak@us.ibm.com>
To:  viro@parcelfarce.linux.theplanet.co.uk
Subject:  [PATCH 0/2] unshare: new system call
Date:  Thu, 7 Jul 2005 13:19:49 -0400 (Eastern Daylight Time)
Cc:  linux-fsdevel@vger.kernel.org
Archive-link:  Article, Thread


Hi Al,

As suggested before, now that 2.6.12 is out, I am resubmitting the 
unshare() system call patch for inclusion in the -mm tree. The only
change in the patch since the last time it was sent to this list with
RFC is the improved argument checking suggested on this list.

Thanks. 

-Janak

Patch Summary:
This patch implements a new system call, unshare.  unshare allows a 
process to dissociate parts of process context that were initially 
being shared using the clone() system call.

The patch consists of two parts:
[1/2] Implements the system call handler function sys_unshare.
[2/2] Implements system call setup for i386 architecture, on which
      the patch was tested.

Patch Justification:
unshare system call is needed to implement, using PAM, 
per-security_context and/or per-user namespace to provide 
polyinstantiated directories. Using unshare and bind mounts, a 
PAM module can create private namespace with appropriate 
directories(based on user's security context) bind mounted on 
public directories such as /tmp, thus providing an instance of 
/tmp that is based on user's security context. Without the 
unshare system call, namespace separation can only be achieved 
by clone, which would require porting and maintaining all commands 
such as login, su, gdm, and sshd that establish a user session. 

Overall Approach:
The overall approach followed clone system call and its permission
enforcement. However, instead of clone's "what do we leave shared?" 
logic, here the logic was based on "what do we unshare, that was 
previously being shared?". Unlike clone, which operated on a newly 
allocated and not-yet schedulable task structure, additional
task_lock()s were taken to avoid race conditions from unshare 
having to work on the current process. Before unsharing any part 
of the context, a check is made to ensure that that part of the
context is being shared in the first place. If the context is not
being shared to begin with, the system call returns success. If 
the context is being shared, the system call makes a private copy
of that context and updates the appropriate pointers of the 
current task structure to point to this new private copy. If allocation
and setup of the private copy fails, the system call appropriately
restores the current task structures to continue using the shared
context.

Currently, the system call only allows "unsharing" of namespace, 
signal handlers and virtual memory, because those three were deemed 
useful on this mailing list in the past. 

Testing:
The patch has been unit tested on uni-processor i386 architecture
based Fedora Core 3 system running 2.6.12 kernel.

Signed-off-by: Janak Desai

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds