LWN.net Logo

Credential records

By Jonathan Corbet
September 25, 2007
Every Linux process carries with it a set of credentials which describe its privileges within the system. Credentials include the user ID, group membership, capabilities, security context, and more. These credentials are currently stored in the task_struct structure associated with each process; an operation which changes credentials does so by operating directly on the task_struct structure. This approach has worked for many years, but it occasionally shows its age.

In particular, the current scheme makes life hard for kernel code which needs to adopt a different set of credentials for a limited time. In an attempt to remedy that situation, David Howells has posted a patch which significantly changes the handling of process credentials. The result is a more complex system, but also a system which is more flexible, and, with luck, more secure.

The core idea behind this patch is that all process credentials (attributes which describe how a process can operate on other objects) should be pulled out of the task structure into a separate structure of their own. The result is struct cred, which holds the effective filesystem user and group IDs, the list of group memberships, the effective capabilities, the process keyrings, a generic pointer for security modules, and some housekeeping information. The result is quite a bit of code churn as every access to the old credential information is changed to look into the new cred structure instead.

That churn is complicated by the fact that quite a bit of the credential information has not really moved to the cred structure; instead it is mirrored there. One of the fundamental rules for how struct cred works is that the structure can only be changed by the process it describes. So anything in the structure which can be changed by somebody else - capabilities and keyrings, for example - remain in the task_struct structure and are copied into the cred structure as needed. "As needed," for all practical purposes, means anytime those credentials are to be checked. So most system calls get decorated with this extra bit of code:

    result = update_current_cred();
    if (result < 0)
        return result;

The next rule says that the cred structure can never be altered once it has been attached to a task. Instead, a read-copy-update technique must be used, wherein the cred structure is copied, the new copy is changed, then the pointer from the task_struct structure is set to the new structure. The old one, which is reference counted, persists while it is in use and is eventually disposed of via RCU.

There is a whole set of utility functions for dealing with credentials, a few of which are:

    struct cred *get_current_cred();
    void put_cred(struct cred *cred);

A call to get_current_cred() takes a reference to the current process's cred structure and returns a pointer to that structure. put_cred() releases a reference.

A change to a credentials structure usually involves a set of calls to:

    struct cred *dup_cred(const struct cred *cred);
    void set_current_cred(struct cred *cred);

The current credentials can be copied with dup_cred(); the duplicate, once modified, can be made current with set_current_cred(). A set of new hooks has been added to allow security modules to participate in the duplication and setting of credentials.

So far, this infrastructure may seem like a bunch of extra work with the gain yet to be explained. The direction that David is going with this change can be seen with this new function:

    struct cred *get_kernel_cred(const char *service,
			         struct task_struct *daemon);

The purpose of this function is to create a new credentials structure with the requisite privileges for the given service. The daemon pointer indicates a current process which should be used as the source for the new credentials - essentially, the new cred structure will enable its holder to act as if it were the daemon process. The current security module gets a chance to change how those credentials are set up; in fact, the interpretation of the "service" string is only done in security modules. In the absence of a security module, get_kernel_cred() will just duplicate the credentials held by daemon.

This capability is used in a new version of David's venerable FS-Cache (formerly cachefs) patch set. FS-Cache implements a local cache for network-based filesystems; the locally-stored cache will, naturally, have all of the security concerns as the remote filesystem. There is a daemon which does a certain amount of the cache management work, but other accesses to the cache are performed by FS-Cache code running in the context of a process which is working with files on the remote filesystem. Using the above function, the FS-Cache code is able to empower any process to work with the privileges of the daemon process for just as long as is needed to get the filesystem work done.

The end result is that security policies can be carried further into the kernel than before. In the FS-Cache case, kernel code doing caching work always operates under the effective capabilities of the cache management daemon. So any protections, SELinux policies, etc. which apply to the daemon will also apply when FS-Cache work is being done in a different context. This should result in a more secure system overall.

The credential work is still in a relatively early state with a fair amount of work yet to be done. It will be quite a big patch by the time the required changes are made throughout the kernel. So this is not a 2.6.24 candidate. The work is progressing, though, so it will likely be knocking on the mainline door at some point.


(Log in to post comments)

Across fork()?

Posted Sep 28, 2007 22:53 UTC (Fri) by filker0 (guest, #31278) [Link]

How does this work with fork()? Does the new process end up with a pointer to the cred structure of its parent (reference count incremented, of course) until the new process makes a change?

I am somewhat bothered by the mirroring of task_struct information. It seems that a lot of extra overhead and churn by this, since each such change would allocate a new cred structure, copy data into it, then potentially discard the old one. Fragmentation is a danger of such an approach. I don't know enough to know how often this happens, though. I have this gut feeling that I'm missing something.

On VMS, any task could turn off any privs that it didn't need. I'm a bit fuzzy on whether this persisted to the end of the task or whether a task could regain the surrendered privilege (I know that, with the "SETPRIV" privilege connected to the user credentials, a task could, but without, I'm can't recall). I can see this adding better security, so the extra overhead might be worth it.

The number of added calls (one on most system calls, if I read the text correctly), even if update_current_cred() is very efficient, also worries me; I'm an embedded programmer, and I know just how much overhead a call can add if its on a critical path.

Across fork()?

Posted Oct 1, 2007 4:10 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

I think the main idea is to take coherent snapshots of the current credentials at the moment a syscall's made, so that it can follow the request all the way through to completion.

In a multithreaded app, you could have races on some of the details, because not all credentials are per-thread. The kernel may have reason to examine your credentials more than once through the process of executing a system call, and those could be spaced widely in time.

Imagine symlink traversal over a slow link. I remember reading somewhere that Linux's support for deep directory structures and high levels of symlink nesting means a single directory lookup could cause 300MB of disk to get read if you set things up right. :-) An attacker would be motivated to do just that.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds