User: Password:
|
|
Subscribe / Log in / New account

Re: [PATCH] vfs: fix race in rcu lookup of pruned dentry

From:  Linus Torvalds <torvalds-AT-linux-foundation.org>
To:  Hugh Dickins <hughd-AT-google.com>
Subject:  Re: [PATCH] vfs: fix race in rcu lookup of pruned dentry
Date:  Mon, 18 Jul 2011 14:42:51 -0700
Message-ID:  <CA+55aFxq1KZycxXCwLARv0WOYG_-aim=e9kt=eLcNuymmeoyiA@mail.gmail.com>
Cc:  Al Viro <viro-AT-zeniv.linux.org.uk>, Andrew Morton <akpm-AT-linux-foundation.org>, Nick Piggin <npiggin-AT-kernel.dk>, linux-kernel-AT-vger.kernel.org, linux-fsdevel-AT-vger.kernel.org
Archive-link:  Article

On Mon, Jul 18, 2011 at 2:19 PM, Hugh Dickins <hughd@google.com> wrote:
>
> __d_lookup_rcu() is being careful about *inode, yes.
>
> But I'd forgotten it was even setting it: doesn't that setting get
> overridden later by the more careless *inode = path->d_entry->d_inode
> at the head of __follow_mount_rcu()'s loop?
>
> Perhaps that line just needs to be moved to the tail of the loop?

Ahh. Bingo. Yes, I think you found it.

I don't think it should touch that *inode value in
__follow_mount_rcu() unless we actually followed a mount, exactly
because it will overwrite the thing that we were so careful about in
__d_lookup_rcu().

So how about this patch that replaces the earlier mount-point sequence
number one. The only difference is (as you mention) to just do the
*inode update at the end of the loop, so that we don't overwrite the
valid inode data with a non-checked one when we don't do anything.

Untested. But this should make my propised change to fs/dcache.c be
irrelevant, because whether we clear d_inode or not, the existing
sequence number checks will catch it. Agreed?

                      Linus
 fs/namei.c |   21 +++++++++++++++++++--
 1 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5c867dd1c0b3..b9cd558a00be 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -938,11 +938,12 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 {
 	for (;;) {
 		struct vfsmount *mounted;
+		unsigned int seq;
+
 		/*
 		 * Don't forget we might have a non-mountpoint managed dentry
 		 * that wants to block transit.
 		 */
-		*inode = path->dentry->d_inode;
 		if (unlikely(managed_dentry_might_block(path->dentry)))
 			return false;
 
@@ -952,9 +953,25 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 		mounted = __lookup_mnt(path->mnt, path->dentry, 1);
 		if (!mounted)
 			break;
+		seq = read_seqcount_begin(&mounted->mnt_root->d_seq);
+
+		/*
+		 * The memory barrier in read_seqcount_begin() is sufficient,
+		 * so we can use __read_seqcount_retry() to check the prev
+		 * sequence numbers.
+		 */
+		if (!__read_seqcount_retry(&path->dentry->d_seq, nd->seq))
+			return false;
 		path->mnt = mounted;
 		path->dentry = mounted->mnt_root;
-		nd->seq = read_seqcount_begin(&path->dentry->d_seq);
+		nd->seq = seq;
+
+		/*
+		 * Update the inode too. We don't need to re-check the
+		 * dentry sequence number here after this d_inode read,
+		 * because a mount-point is always pinned.
+		 */
+		*inode = path->dentry->d_inode;
 	}
 	return true;
 }

(Log in to post comments)


Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds