| From: |
| Janak Desai <janak@us.ibm.com> |
| To: |
| viro@parcelfarce.linux.theplanet.co.uk |
| Subject: |
| [PATCH 0/2] unshare: new system call |
| Date: |
| Thu, 7 Jul 2005 13:19:49 -0400 (Eastern Daylight Time) |
| Cc: |
| linux-fsdevel@vger.kernel.org |
| Archive-link: |
| Article,
Thread
|
Hi Al,
As suggested before, now that 2.6.12 is out, I am resubmitting the
unshare() system call patch for inclusion in the -mm tree. The only
change in the patch since the last time it was sent to this list with
RFC is the improved argument checking suggested on this list.
Thanks.
-Janak
Patch Summary:
This patch implements a new system call, unshare. unshare allows a
process to dissociate parts of process context that were initially
being shared using the clone() system call.
The patch consists of two parts:
[1/2] Implements the system call handler function sys_unshare.
[2/2] Implements system call setup for i386 architecture, on which
the patch was tested.
Patch Justification:
unshare system call is needed to implement, using PAM,
per-security_context and/or per-user namespace to provide
polyinstantiated directories. Using unshare and bind mounts, a
PAM module can create private namespace with appropriate
directories(based on user's security context) bind mounted on
public directories such as /tmp, thus providing an instance of
/tmp that is based on user's security context. Without the
unshare system call, namespace separation can only be achieved
by clone, which would require porting and maintaining all commands
such as login, su, gdm, and sshd that establish a user session.
Overall Approach:
The overall approach followed clone system call and its permission
enforcement. However, instead of clone's "what do we leave shared?"
logic, here the logic was based on "what do we unshare, that was
previously being shared?". Unlike clone, which operated on a newly
allocated and not-yet schedulable task structure, additional
task_lock()s were taken to avoid race conditions from unshare
having to work on the current process. Before unsharing any part
of the context, a check is made to ensure that that part of the
context is being shared in the first place. If the context is not
being shared to begin with, the system call returns success. If
the context is being shared, the system call makes a private copy
of that context and updates the appropriate pointers of the
current task structure to point to this new private copy. If allocation
and setup of the private copy fails, the system call appropriately
restores the current task structures to continue using the shared
context.
Currently, the system call only allows "unsharing" of namespace,
signal handlers and virtual memory, because those three were deemed
useful on this mailing list in the past.
Testing:
The patch has been unit tested on uni-processor i386 architecture
based Fedora Core 3 system running 2.6.12 kernel.
Signed-off-by: Janak Desai
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html