LWN.net Logo

revoke() returns

By Jonathan Corbet
December 18, 2007
LWN last looked at Pekka Enberg's revoke() patch in July, 2006. The purpose of this proposed system call is to completely disconnect all processes from a specific file, thus allowing a new process to have exclusive access to that file. There are a number of applications for this functionality, such as ensuring that a newly logged-in user is the only one able to access resources associated with the console - the sound device, for example. There are kernel developers who occasionally mutter ominously about unfixable security problems resulting from the lack of the ability to revoke open file descriptors - though they tend, for some reason, to not want to publish the details of those vulnerabilities. Any sort of real malware scanning application will also need to be able to revoke access to files determined to contain Bad Stuff.

Pekka has recently posted a new version of the patch, so a new look seems warranted. The first thing one notes is that the revoke() system call is gone; instead, the new form of the system call is:

    int revokeat(int dir_fd, const char *filename);

This call thus follows the form of a number of other, relatively new *at() system calls. Here, filename is the name of the file for which access is to be revoked; if it is an absolute pathname then dir_fd is ignored. Otherwise, dir_fd is an open file descriptor for the directory to be used as the starting point in the lookup of filename. The special value AT_FDCWD indicates the current working directory for the calling process. If the revokeat() call completes successfully, only file descriptors for filename which are created after the call will be valid.

There is a new file_operations member created by this patch set:

    int (*revoke)(struct file *filp);

This function's job is to ensure that any outstanding I/O operations on the given file have completed, with a failure status if needed. So far, the only implementation is a generic version for filesystems; it is, in its entirety:

    int generic_file_revoke(struct file *file)
    {
	return do_fsync(file, 1);
    }

In the long term, revokeat() will need support from at least a subset of device drivers to be truly useful.

Disconnecting access to regular file descriptors is relatively straightforward; the system call simply iterates through the list of open files on the relevant device and replaces the file_operations structure with a new set which returns EBADF for every attempted operation. (OK, for almost every attempted operation - reads from sockets and device files return zero instead). The only tricky part is that it must iterate through the file list multiple times until no open files are found; otherwise there could be race conditions with other system calls creating new file descriptors at the same time that the old ones are being revoked.

The trickier part is dealing with memory mappings. In most cases, it is a matter of finding all virtual memory areas (VMAs) associated with the file, setting the new VM_REVOKED flag, and calling zap_page_range() to clear out the associated page table entries. The VM_REVOKED flag ensures that any attempt to fault pages back in will result in a SIGBUS signal - likely to be an unpleasant surprise for any process attempting to access that area.

Even trickier is the case of private, copy-on-write (COW) mappings, which can be created when a process forks. Simply clearing those mappings might be effective, but it could result in the death of processes which do not actually need to be killed. But it is important that the COW mapping not be a way to leak data written to the file after the revokeat() call. So the COW mappings are separated from each other by a simple (but expensive) call to get_user_pages(), which will create private copies of all of the relevant pages.

There has been relatively little discussion of this patch so far - perhaps the relevant developers have begun their holiday breaks and revoked their access to linux-kernel. This is an important patch with a lot of difficult, low-level operations, though; that is part of why it has been so long in the making. So it will need some comprehensive review before it can be considered ready for the mainline. Given the nature of the problem, it would not be surprising if another iteration or two were needed still.


(Log in to post comments)

revoke() returns

Posted Dec 23, 2007 21:54 UTC (Sun) by jcm (subscriber, #18262) [Link]

Oh, the puns...

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds