|From:||Linus Torvalds <torvalds-AT-linux-foundation.org>|
|To:||Ingo Molnar <mingo-AT-kernel.org>|
|Subject:||Re: [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races|
|Date:||Tue, 3 Dec 2013 10:10:29 -0800|
|Cc:||Simon Kirby <sim-AT-hostway.ca>, Peter Zijlstra <peterz-AT-infradead.org>, Waiman Long <Waiman.Long-AT-hp.com>, Ian Applegate <ia-AT-cloudflare.com>, Al Viro <viro-AT-zeniv.linux.org.uk>, Christoph Lameter <cl-AT-gentwo.org>, Pekka Enberg <penberg-AT-kernel.org>, LKML <linux-kernel-AT-vger.kernel.org>, Chris Mason <chris.mason-AT-fusionio.com>, Thomas Gleixner <tglx-AT-linutronix.de>|
On Tue, Dec 3, 2013 at 12:52 AM, Ingo Molnar <email@example.com> wrote: > > I'd expect such bugs to be more prominent with unlucky object > size/alignment: if mutex->count lies on a separate cache line from > mutex->wait_lock. I doubt that makes much of a difference. It's still just "CPU cycles" away, and the window will be tiny unless you have multi-socket machines and/or are just very unlucky. For stress-testing, it would be much better to use some hack like /* udelay a bit if the spinlock isn't contended */ if (mutex->wait_lock.ticket.head+1 == mutex->wait_lock.ticket.tail) udelay(1); in __mutex_unlock_common() just before the spin_unlock(). Make the window really *big*. That said: >> So having a non-atomic refcount protected inside a sleeping lock is >> a bug, and that's really the bug that ended up biting us for pipes. >> >> Now, the question is: do we have other such cases? How do we >> document this? [...] > > I'd not be surprised if we had such cases, especially where > spinlock->mutex conversions were done So I actually don't think we do. Why? Having a sleeping lock inside the object that protects the refcount is actually hard to do. You need to increment the refcount when finding the object, but that in turn tends to imply holding the lock *before* you find it. Or you have to find it locklessly, which in turn implies RCU, which in turn implies non-sleeping lock. And quite frankly, anybody who uses SRCU and a sleeping lock is just broken to begin with. That's just an abomination. If the RT codepaths do something like that, don't even tell me. It's broken and stupid. So protecting a refcount with a mutex in the same object is really quite hard. Even the pipe code didn't actually do that, it just happened to nest the real lock inside the sleeping lock (but that's also why it was so easy to fix - the sleeping lock wasn't actually necessary or helpful in the refcounting path). So my *gut* feel is that nobody does this. But people have been known to do insane things just because they use too many sleeping locks and think it's "better". Linus
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds