User: Password:
|
|
Subscribe / Log in / New account

Re: [PATCH 17/18] fs: icache remove inode_lock

From:  Nick Piggin <npiggin-AT-kernel.dk>
To:  Dave Chinner <david-AT-fromorbit.com>
Subject:  Re: [PATCH 17/18] fs: icache remove inode_lock
Date:  Sat, 16 Oct 2010 07:50:45 +1100
Message-ID:  <20101015205045.GA20476@amd>
Cc:  Nick Piggin <npiggin-AT-kernel.dk>, Christoph Hellwig <hch-AT-infradead.org>, linux-fsdevel-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org
Archive-link:  Article

On Fri, Oct 15, 2010 at 09:59:43PM +1100, Dave Chinner wrote:
> On Fri, Oct 15, 2010 at 05:41:50PM +1100, Nick Piggin wrote:
> > You're worried about mere mortals reviewing and understanding it...
> > I don't really know. If you understand inode locking today, you
> > can understand the inode scaling series quite easily. Ditto for
> > dcache. rcu-walk path walking is trickier, but it is described in
> > detail in documentation and changelog.
> > 
> > And you can understand the high level approach without exactly
> > digesting every detail at once. The inode locking work goes to
> > break up all global locks:
> > 
> > - a single inode object is protected (to the same level as
> >   inode_lock) with i_lock. This makes it really trivial for
> >   filesystems to lock down the object without taking a global
> >   lock.
> 
> Which is unnecessarily wide, and results in i_lock  having to have
> list locks nested inside it, and that leads to the lock
> inversion try-lock mess that several people have complained about.

Gee, you keep repeating this so often that you have me doubting
myself a tiny bit, so I have to check.

$ grep spin_trylock fs/inode.c fs/fs-writeback.c
fs/inode.c:             if (!spin_trylock(&inode->i_lock)) {
fs/inode.c:             if (!spin_trylock(&old->i_lock)) {
fs/inode.c:             if (!spin_trylock(&old->i_lock)) {
fs/fs-writeback.c:      if (!spin_trylock(&inode->i_lock)) {
fs/fs-writeback.c:      if (!spin_trylock(&inode->i_lock)) {
fs/fs-writeback.c:      if (!spin_trylock(&inode->i_lock)) {

This is your unmaintainable mess?  You decided to rewrite your own
vfs-scale tree because I wanted i_lock to protect the icache state (and
offered you very good reason for)?

Well, surely they must be horrible complex and unmaintainable
beasts....

repeat:
	/*
	 * Don't use RCU walks, common case of no old inode
	 * found requires hash lock.
	 */
        spin_lock_bucket(b);
        hlist_bl_for_each_entry(old, node, &b->head, i_hash) {
                if (old->i_ino != ino)
                        continue;
                if (old->i_sb != sb)
                        continue;
                if (old->i_state & (I_FREEING|I_WILL_FREE))
                        continue;
                if (!spin_trylock(&old->i_lock)) {
                        spin_unlock_bucket(b);
                        cpu_relax();
                        goto repeat;
                }

Nope, no big deal. The rest are much the same. So thanks for the
repeated suggestion, but I'll actually prefer to keep my regular i_lock
locking scheme where you don't need to look up the documentation and
think hard about coherency between protected and unprotected parts of
the inode whenever you use it. I didn't stumble upon my locking design
by chance.

If you think a few trylocks == impending doom, then xfs is looking
pretty poorly at this point. So I would ask that you stop making things
up about my patch series. If you dislike the trylocks so much that you
think it is worth breaking the i_lock regularity or using RCU or
whatever, then please propose them as incremental patches to the end
of my series where you can see they logically will fit. You know I will
argue that locking consistency is more important for maintainability
than these few trylocks, however.



(Log in to post comments)


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds