|
|
Log in / Subscribe / Register

How to get rid of mmap_sem

How to get rid of mmap_sem

Posted May 13, 2019 0:39 UTC (Mon) by dgc (subscriber, #6611)
Parent article: How to get rid of mmap_sem

My experiments with range locks should not be dismissed just because I used "an xfs specific btree" to back the range lock or that it was used to track IO ranges. I've since implemented a generic btree with optimistic lock coupling (i.e. lockless lookups, and fine-grained per-node, full 64 bit range, scales to tens of millions of concurrent random lookup/insert/delete ops per second on trees containing millions of records across 16p) to get rid of the mutex that limited range lock scalability.

However, I'm getting the /almost identical results/ for the OBT as with the mutex protected XFS specific btree backing the range locks. The OBT range lock is a bit faster than both the XFS specific btree and a rwsem at 4 threads, but it's not scaling to 8 threads like a rwsem does in my IO experiments.

Basically, the problem with tree-based range locks is that as long as you only have a few locks being held you cannot make lock/unlock operations concurrent because non-overlapping (i.e. fast path) range locks are essentially 100% modification workload. Hence on a small tree they all hit the same node(s) and that quickly becomes, like a rwsem, a cacheline bouncing bound operation. This is by far the most important thing I've learnt from these experiments so far.

You can see this easily with AIO: you might be submitting 128 IOs per syscall and have them all in progress at once, but this still only means the lock depth is only 1 because we only lock the inode over AIO submission. And submission is serialised. Hence by the time you get to 8+ threads all submitting hundreds of AIO at once, the range lock depth is still only 8, and so it's still a single tree node they bang on, hence contend on a single cacheline just like a rwsem or a mutex protected single threaded tree.

IOWs, generic range locking turns out to be /really hard to scale/ and so you can't just replace a rwsem with a tree-based range lock and expect it to go lots faster. For some cases it will be better, but for just as many it will be no better or significantly worse because small trees will still hit the single contended cacheline wall....

-Dave.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds