Re: [PATCH 17/18] fs: icache remove inode_lock
[Posted October 26, 2010 by corbet]
| From: |
| Nick Piggin <npiggin-AT-kernel.dk> |
| To: |
| Dave Chinner <david-AT-fromorbit.com> |
| Subject: |
| Re: [PATCH 17/18] fs: icache remove inode_lock |
| Date: |
| Sat, 16 Oct 2010 07:50:45 +1100 |
| Message-ID: |
| <20101015205045.GA20476@amd> |
| Cc: |
| Nick Piggin <npiggin-AT-kernel.dk>,
Christoph Hellwig <hch-AT-infradead.org>,
linux-fsdevel-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org |
| Archive‑link: | |
Article |
On Fri, Oct 15, 2010 at 09:59:43PM +1100, Dave Chinner wrote:
> On Fri, Oct 15, 2010 at 05:41:50PM +1100, Nick Piggin wrote:
> > You're worried about mere mortals reviewing and understanding it...
> > I don't really know. If you understand inode locking today, you
> > can understand the inode scaling series quite easily. Ditto for
> > dcache. rcu-walk path walking is trickier, but it is described in
> > detail in documentation and changelog.
> >
> > And you can understand the high level approach without exactly
> > digesting every detail at once. The inode locking work goes to
> > break up all global locks:
> >
> > - a single inode object is protected (to the same level as
> > inode_lock) with i_lock. This makes it really trivial for
> > filesystems to lock down the object without taking a global
> > lock.
>
> Which is unnecessarily wide, and results in i_lock having to have
> list locks nested inside it, and that leads to the lock
> inversion try-lock mess that several people have complained about.
Gee, you keep repeating this so often that you have me doubting
myself a tiny bit, so I have to check.
$ grep spin_trylock fs/inode.c fs/fs-writeback.c
fs/inode.c: if (!spin_trylock(&inode->i_lock)) {
fs/inode.c: if (!spin_trylock(&old->i_lock)) {
fs/inode.c: if (!spin_trylock(&old->i_lock)) {
fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) {
fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) {
fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) {
This is your unmaintainable mess? You decided to rewrite your own
vfs-scale tree because I wanted i_lock to protect the icache state (and
offered you very good reason for)?
Well, surely they must be horrible complex and unmaintainable
beasts....
repeat:
/*
* Don't use RCU walks, common case of no old inode
* found requires hash lock.
*/
spin_lock_bucket(b);
hlist_bl_for_each_entry(old, node, &b->head, i_hash) {
if (old->i_ino != ino)
continue;
if (old->i_sb != sb)
continue;
if (old->i_state & (I_FREEING|I_WILL_FREE))
continue;
if (!spin_trylock(&old->i_lock)) {
spin_unlock_bucket(b);
cpu_relax();
goto repeat;
}
Nope, no big deal. The rest are much the same. So thanks for the
repeated suggestion, but I'll actually prefer to keep my regular i_lock
locking scheme where you don't need to look up the documentation and
think hard about coherency between protected and unprotected parts of
the inode whenever you use it. I didn't stumble upon my locking design
by chance.
If you think a few trylocks == impending doom, then xfs is looking
pretty poorly at this point. So I would ask that you stop making things
up about my patch series. If you dislike the trylocks so much that you
think it is worth breaking the i_lock regularity or using RCU or
whatever, then please propose them as incremental patches to the end
of my series where you can see they logically will fit. You know I will
argue that locking consistency is more important for maintainability
than these few trylocks, however.