LWN: Comments on "Range reader/writer locks for the kernel"

How about update locks?

andresfreund — Sat, 17 Jun 2017 01:31:00 +0000

> Range locks would be very useful for certain systems, though - relational databases spring to mind.

Hence most databases having row-level locking.

How about update locks?

nybble41 — Sat, 17 Jun 2017 01:03:45 +0000

> So when the user updated the system, the program ran in read-only mode until they hit "confirm", at which point it opened the ledger master files followed by the individual ledger files in read-write mode, and replayed the update for real.

So what happened when two instances of the program prepared conflicting updates? Obviously only one can replay its update at a time, but whichever program goes second won't be aware of the first update while preparing its changes. Does the update fail after it sees that the data changed, or does it simply overwrite the changes the first program did with its own changes based on obsolete data?

This is, I believe, the problem that update locks are designed to solve. They indicate an intention to update the record in the future (after upgrading to an exclusive lock). Only one thread can prepare an update at a time. In the meantime, other threads can still read the data so long as they aren't preparing to make an update based on it. It's a similar concept to a reader/writer lock except that with an R/W lock there is no coherent way to atomically upgrade from a reader lock to a writer lock without the possibility of failure. (What would happen if multiple threads tried to upgrade? One would have to go first, and then the others would either fail to upgrade to a writer lock or see different data than was present before upgrading.) An update lock is like a "privileged" reader lock in the sense that there can be many readers, but only one of them (the updater) is able to upgrade to a writer lock.

How about update locks?

Wol — Fri, 16 Jun 2017 21:47:35 +0000

> Update locks are typically used in combination with "CR" (aka reader lock) and "EX" (exclusive lock) modes, with an API to atomically convert from update to exclusive.

When I wrote an accounts system, the OS allowed you to specify "many readers or one writer", "many readers and one writer", or "many readers and many writers". So all the accounts files were spec'd as "many readers and one writer".

So when the user updated the system, the program ran in read-only mode until they hit "confirm", at which point it opened the ledger master files followed by the individual ledger files in read-write mode, and replayed the update for real. (It does help if you have a coherent overall design when you need to do locking :-)

Range locks would be very useful for certain systems, though - relational databases spring to mind.

Cheers,
Wol

Range reader/writer locks for the kernel

allenbh — Wed, 07 Jun 2017 12:03:07 +0000

While the article mentions locking of different parts of the range, presumably by different threads, the article doesn't specifically mention locking different parts of the range at the same time by the same thread, and in that case any constraints for ordering of acquiring locks on the range. I wonder if the implementation will do cycle detection at run time, or leave it up to users of the lock not to introduce cyclic deadlocks, or only allow locking one contiguous part of the range at a time. I wonder how effective static analysis will be, and what kind of techniques might need to be added for static analysis to check the use of range locks.

How about update locks?

neilbrown — Wed, 07 Jun 2017 01:20:00 +0000

> A great use case it when one wants to prepare some kind of structure update privately while letting readers work, and then convert to exclusive and atomically publish the update:

The normal approach to that use case in the kernel is to prepare the update, then grab the exclusive lock, then see if the structure has changed since you started preparing the update (e.g using a sequence counter). If it has, bale out and start again. If it hasn't, publish the update.

I imagine that approach would not be ideal in the cluster context that DLM was designed for, as latencies would be higher etc. In the kernel, it seems to work well.

As a general rule, keeping the locks simple minimizes the time it takes to claim and release them. Splitting locks (such as replacing a per-hash-table lock with lots of per-hash-chain locks) tends to be the better approach to scalability, rather than anything more complex that mutual-exclusion.

Range locks are handling a fairly unique case. Files are used in an enormous variety of ways - sometimes as a whole, sometimes as lots of individual records. In some case the whole-file mmap_sem really is simplest and best. Other times per-page locks are best. But sometimes, taking mmap_sem will cause too much contention, while taking the page lock on every single page would take even longer... and some of the pages might not be allocated yet.

So range locks are being added, not because it is a generally good idea, but because there is a specific use case (managing the internals of files) that seems to justify them. Do you know of a specific in-kernel use case that would significantly benefit from upgradeable locks? (We already have downgradeable locks - see downgrade_write()).

How about update locks?

saffroy — Tue, 06 Jun 2017 22:14:41 +0000

For some reason I'd expect that people would want an API for update locks before range locks. For a definition of update locks, see the "PW" lock mode in https://en.wikipedia.org/wiki/Distributed_lock_manager.

Update locks are typically used in combination with "CR" (aka reader lock) and "EX" (exclusive lock) modes, with an API to atomically convert from update to exclusive. A great use case it when one wants to prepare some kind of structure update privately while letting readers work, and then convert to exclusive and atomically publish the update: this helps minimize the time during which the exclusive lock is held.

Is that kind of lock API discussed?

32-bit systems

davidlohr — Tue, 06 Jun 2017 15:27:18 +0000

I considered this, but ultimately decided it's not worth it just for lustre. A good example of templates is what interval trees do, and there it's most certainly worth the hassle. In any case lustre folks are well aware of the 32-bit situation.

32-bit systems

ianmcc — Tue, 06 Jun 2017 15:01:16 +0000

looks to be screaming out for template <typename int_type> ,,,,

32-bit systems

Paf — Tue, 06 Jun 2017 01:21:23 +0000

The Lustre developers (I'm one, but it wasn't me) pointed this out, actually... A few weeks or maybe months ago now. I'm surprised it hasn't been fixed!

32-bit systems

abatters — Mon, 05 Jun 2017 21:11:21 +0000

The old version of range_lock_init() (in linux/drivers/staging/lustre/lustre/llite/range_lock.c) uses __u64 for the start/end offsets, but this generic version uses "unsigned long", which will limit its range to 4 GB on 32-bit systems. That is fine for mmap(), but would limit its usefulness for "ranges of files", at least for byte-level granularity.