LWN.net Logo

O_DIRECT locking rework

From:  Chris Mason <chris.mason@oracle.com>
To:  linux-fsdevel@vger.kernel.org
Subject:  [RFC PATCH 0/2] O_DIRECT locking rework
Date:  Fri, 20 Oct 2006 14:32:37 -0400
Cc:  akpm@osdl.org, zach.brown@oracle.com
Archive-link:  Article, Thread

Hello everyone,

O_DIRECT locking currently uses a few different per-inode locks to
prevent races between buffered io and direct io.  This is awkward, and
sometimes grows races where we expose old data on disk.

For example, I can't quite see how we protect from an mmap triggered
writepage from filling a hole in the middle of an O_DIRECT read.

This patch set changes O_DIRECT to use page locks instead of
mutex/semaphores.  It looks in the radix tree for pages affected by this
O_DIRECT read/wrte and locks any pages it finds.

For any pages not present, a stub page struct is inserted into the
radix tree.  The page cache routines are changed to either wait on this
place holder page or ignore it as appropriate.  Place holders are not
valid pages at all, you can't trust page->index or any other field.

The first patch introduces these place holder pages.  The second patch
changes fs/direct-io.c to use them.  Patch #2 needs work,
direct-io.c:lock_page_range can be made much faster, and it needs to be
changed to work in chunks instead of pinning down the whole range at
once.

But, this is enough for people to comment on the basic idea.  Testing
has been very light. I'm not sure I've covered all of the buffered vs
direct races yet.  The main goal of posting now is to talk about the
place holder pages and possible optimizations.

For the XFS guys, you probably want to avoid the page locking steps as
well, a later version will honor that.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds