LWN.net Logo

Fast reader/writer locks

The Linux kernel contains a number of primitives for controlling mutual exclusion. Semaphores and spinlocks (in several varieties) have been around for a while, and the read-copy-update mechanism was added in the 2.5 series. Yet another mechanism, called "fast reader/writer locks," has found its way into Andrew Morton's -mm patch set, and appears likely to be forwarded on to Linus soon for inclusion. So this seems like as good a time as any to look at how "frlocks" work.

Frlocks, as implemented by Stephen Hemminger, are aimed at solving a couple of problems with the gettimeofday() system call. One is simple performance; gettimeofday() is not particularly slow, but some applications (including anything using the X Window System) call it frequently. It also turns out that the current implementation, which uses reader/writer spinlocks, is susceptible to a denial of service problem. Frequent calls to gettimeofday() can delay or lock out timer tick updates.

The frlock patch works by not blocking readers or writers at all. Code wishing write access to the protected data structure is given that access immediately (at least, in the absence of other writers), so there is no way that time updates can be blocked or delayed. Readers, too, get immediate access to the data structure. The catch is that readers must be prepared to retry the access if collides with a writer for access to the data.

The lock works by maintaining "pre" and "post" sequence numbers. A writer process does the following:

  • Take out a spinlock associated with the frlock
  • Increment the "pre" sequence
  • Mess around with the data structure
  • Increment the "post" sequence
  • Release the spinlock

Readers do something like the following:

  • Remember the lock's "post" sequence number
  • Grab the data of interest
  • Ensure that the lock's "pre" sequence matches the remembered "post" sequence. If not, go back to the beginning.

In other words, as long as the sequence numbers match, the reader knows that no writer changed the data while the reader was doing its thing. In practice, the reader side tends to be expressed in code like:

	do {
		seq = fr_read_begin(&some_lock);
		/* Copy the data */
	} while (seq != fr_read_end(&some_lock));

Frlocks, clearly, will not be suitable for lengthy calculations, or those which have immediate side effects. In cases where a small data structure is changed infrequently but read often, however, frlocks may be the key to improved performance. In the introduction to the latest set of frlock patches, Stephen claims an 18% improvement in the speed of gettimeofday() - and the elimination of the timer tick lockout problem.


(Log in to post comments)

Fast reader/writer locks

Posted Jan 30, 2003 4:59 UTC (Thu) by rickfdd (guest, #4519) [Link]

Can't the pre and post sequence numbers can be combined
to one memory location using odd and even values instead? It
would enable earlier failed read detection, which can spin or
yield as appropriate, although the branch cost is usually
more than the critical section anyway.

Fast reader/writer locks

Posted Jan 30, 2003 5:16 UTC (Thu) by cpeterso (guest, #305) [Link]

I think there would be a race condition if the reader just looked for even/odd sequence number changes. If the sequence number was increments TWICE (by one or multiple writers), then the reader might not recognize the data change event.

Fast reader/writer locks

Posted Jan 30, 2003 13:43 UTC (Thu) by bastiaan (guest, #5170) [Link]

I guess what Rick means is that if you use one sequence the reader will know a concurrent write is in progress if the sequence is odd, in which case it can retry without first copying the data. In code:

do {
   while((seq = fr_read_seq(&some_lock)) & 1);
   /* copy data */
} while (seq != fr_read_seq(&some_lock));

However, since the concurrent write is very unlikely to happen, this extra test may be not worthwhile. But I still don't see why Stephen uses two sequences instead of one.

Fast reader/writer locks

Posted Jan 30, 2003 18:13 UTC (Thu) by mwh63 (guest, #321) [Link]

> Frlocks, clearly, will not be suitable lengthy calculations, ...

Shouldn't that read "suitable *for* lengthy calculations" ?

Fast reader/writer locks

Posted Jan 30, 2003 21:09 UTC (Thu) by corbet (editor, #1) [Link]

But it does read that way...

 

 

...now...

Fast reader/writer locks

Posted Feb 2, 2003 3:44 UTC (Sun) by wolfrider (guest, #3105) [Link]

[G] You rock, Corbet. :)

Fast reader/writer locks

Posted Jan 31, 2003 12:19 UTC (Fri) by tres (guest, #352) [Link]

In lengthy calculations the reads will not complete successfully very often using this method.

livelock?

Posted Jan 30, 2003 20:07 UTC (Thu) by cpeterso (guest, #305) [Link]

I could imagine a livelock problem that would not exist using regular spinlocks. A reader could (theoretically) spin forever if writers keep updating the data. The reader would keep looping, trying to get the most up to date version of the data. If the reader used spinlocks, it would get its turn to read (and then the data would be overwritten and outdated by the many writers).

livelock?

Posted Jan 30, 2003 20:49 UTC (Thu) by hensema (guest, #980) [Link]

Theoretically possible, yes. But this kind of lock is really just meant to be used in situations where relatively little writes are happening. Livelock is thus avoided by programming practice and not by design.

What is a spinlock?

Posted Jan 30, 2003 22:51 UTC (Thu) by perlid (guest, #6533) [Link]

Btw, can anyone explain to me what a spinlock is?
I've done some multithreaded programming, so I know a bit about locks, but I don't know how a spinlock is constructed, and what the differences are between a spinlock and other locks.

What is a spinlock?

Posted Jan 30, 2003 22:54 UTC (Thu) by corbet (editor, #1) [Link]

A spinlock is a lightweight lock with no wait queues; if there is contention for the lock one (or more) threads will "spin" (busy-wait) until the lock is free. They are usually built with some sort of atomic test-and-set instruction; a quick look in the kernel source will show how they are done for any particular architecture.

What is a spinlock?

Posted Jan 31, 2003 18:19 UTC (Fri) by cpeterso (guest, #305) [Link]

With a normal lock, a thread will be put to sleep when it has to wait to obtain the lock. When the lock is available, the thread will be woken up.

As described, a thread will "spin" or "busy-wait" for a spinlock. Thus spinlocks should only be used for very short critical sections. If the spinning thread has to wait a long time, your CPU usage will hit 100% because the spinning thread is executing a very tight loop. That is obviously bad for performance.

Two sequence numbers

Posted Feb 2, 2003 5:02 UTC (Sun) by Ross (subscriber, #4065) [Link]

Are there really two sequence numbers (one pre and one post) as the article implies? If so, why?

I don't see why this wouldn't work with only one sequence number inside the lock. The interface for readers would be the same. Is this to avoid needing atomic reads or something?

Yes, yes, I know I can go read the code.
But I'd be much happier if someone who understands this would explain it. ;)

Re: Two sequence numbers

Posted Feb 6, 2003 5:46 UTC (Thu) by Ross (subscriber, #4065) [Link]

I think I got my answer. It changed to the way I suspected it could sometime since last week.

From http://lwn.net/Articles/21812/:

"Changes:
- name change frlock to seqlock
- separate the locking and counter only data types
- use one counter instead of two to keep track of changes."

Fast reader/writer locks

Posted Feb 2, 2003 8:24 UTC (Sun) by acristianb (guest, #1702) [Link]

If the writer starts by increasing the first seq then, the reader reads the first seq, reads data, then the writer changes data and second seq, than the reader reads the last seq, the two sequences match for the reader but the data is invalid.

A small twist of the reader algorithm that will solve this, and would make this frlock suitable for long (or longer) calculations is this:

* reads first seq and second seq
* compares them (if they are not equal start over because a writer is changing data)
* reads data
* reads second seq
* compares first and second sequences (if they are not equal start over)

Fast reader/writer locks

Posted Feb 6, 2003 4:37 UTC (Thu) by dneto (guest, #4954) [Link]

You've got the reader's seq ordering backward.
The writer updates seq pre, then seq post.
The reader checks seq post, then seq pre.
I think that avoids the problem you mention.

Fast reader/writer locks

Posted Feb 13, 2003 19:16 UTC (Thu) by metacircles (guest, #8895) [Link]

What happened to the vsyscall work that Andrea (I think) was doing? From memory, it used a similar pair-of-counters scheme for updates, but it also avoided the need for an actual syscall, by storing the data (and counters) on a page visible in both kernel and user-space

I thought it had been merged, at least in the x86-64 port. This was a while ago, though.

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds