Development
User-space RCU
The user-space read-copy update (URCU) library was established in February of 2009, and is now used by several projects and available from several distributions. URCU is similar to its Linux-kernel counterpart, providing a replacement for reader-writer locking, among other uses. This similarity continues with readers not synchronizing directly with RCU updaters, thus making RCU read-side code paths exceedingly fast, while furthermore permitting RCU readers to make useful forward progress even when running concurrently with RCU updaters—and vice versa.
Although people who have worked with the Linux kernel's RCU implementation find much to be familiar with in URCU, there are some important differences, including support for several concurrent data structures. To that end, this article summarizes the similarities and discusses the differences as follows:
And then there are, of course, the inescapable answers to the quick quizzes.
Companion articles cover RCU-protected hash tables and stacks and queues.
RCU review
RCU is most frequently used to provide concurrency control for read-mostly linked data structures, so that read-side overhead is minimal. The word “minimal” is no euphemism: In some common cases, RCU's read-side overhead both in the kernel and in user space approaches zero.
Of course, if readers are to have zero overhead, then there logically cannot be any way for updaters to delay them or otherwise prevent them from running. Which in turn means that RCU updaters must handle readers traversing the data structure at all times before, during, and after the update. RCU updaters accomplish this feat by maintaining multiple versions of the data structure, waiting to free up a given old version until such time as all readers that might be referencing that version have moved on.
The following examples show how this works, first for insertion of a new data element into an RCU-protected structure, and then for deletion from an RCU-protected list. As we will see later, RCU can be applied to a great many other data structures as well.
The insertion process is shown in the following diagram, with time progressing through four states from left to right.
The first state shows a global pointer named gptr
with value NULL
.
The first green arrow shows the updater using malloc()
to
arrive at the second state, in which gptr
is still
NULL
, but where the updater's local variable tmp
now references an uninitialized block of memory.
The second green arrow shows the updater initializing the newly allocated
block of memory, resulting in the third state, where the new structure's
fields ->a
, ->b
, and ->c
have the values 1, 2, and 3, respectively.
Finally, the red arrow shows the updater assigning a pointer to the
new structure to the global variable gptr
using
rcu_assign_pointer().
The structure is now colored red, indicating that readers might now
reference it.
This in turn means that any further updates to the new structure must
be done carefully, taking the possibility of concurrent readers into
account.
Quick Quiz 1: Why usercu_assign_pointer()
rather than the simplegptr=tmp
assignment statement that the C language provided for this straightforward job???
The process of deleting an element from an RCU-protected linked list is as follows, again with time progressing through four states from left to right.
The first state shows a linked list with elements
A, B, and C.
The first (red) arrow shows the updater removing element B
from the list using cds_list_del_rcu()
.
Quick Quiz 2: Whycds_list_del_rcu()
instead of simplylist_del_rcu()
like the Linux kernel does?
In the second state, old readers might still be accessing element B,
but new readers have no path to it, hence the yellow color.
This means that the list has two versions: old readers will see
the list containing A, B, and C, while at the same time new
readers will see the list containing A and C.
Eventually, all the old readers still accessing element B will move
to element C, but until then, element B cannot be freed.
We clearly need some way to wait until all of these old readers have
finished, and synchronize_rcu()
does just that, waiting
for any reader that was in already in progress at the time that
synchronize_rcu()
started executing.
Quick Quiz 3:
And just how is synchronize_rcu()
supposed to know when
all the currently executing RCU readers started?
Doesn't that mean that all the RCU readers must do heavyweight operations
to get the current time and store it somewhere?
And doesn't this also require perfect time synchronization?
After synchronize_rcu()
completes, the list will
be as shown in the third column.
Element B is green to indicate that readers can no longer have
a reference to it, so that all readers see the list as containing
only elements A and C.
It is thus safe to free()
element B, resulting in the final
state shown on the right.
Of course, for synchronize_rcu()
to do its job, there
must be some way of determining where RCU readers begin and end.
This job is handled by rcu_read_lock()
, which marks
the beginning of an RCU read-side critical section and
rcu_read_unlock()
, which marks the end.
RCU read-side critical sections may be nested, and cannot contain
synchronize_rcu()
(otherwise deadlock would result).
Any line of code not in an RCU read-side critical section is termed a quiescent state, and any period of time during which each reader thread resides in at least one quiescent state is called a grace period.
Quick Quiz 4: Why the “reader” above? Doesn't every thread need reside in a quiescent state in order for a grace period to have elapsed?
The code to traverse an RCU-protected linked list might look as follows:
1 rcu_read_lock(); 2 cds_list_for_each_entry_rcu(p, head, list) { 3 do_something_with(p->a, p->b, p->c); 4 } 5 rcu_read_unlock();
So what? There are any number of linked-list traversal schemes, and this probably does not look all that special. The special thing about RCU-protected list traversal is that the above code often compiles to exactly the same assembly language as does a simple single-threaded linked-list traversal but nevertheless provides full protection against concurrent modification.
Quick Quiz 5: When does RCU-protected list traversal compile to different assembly language than does simple single-threaded linked-list traversal?
More detail on RCU is available:
- User-Level Implementations of Read-Copy Update.
- Structured Deferral: Synchronization via Procrastination.
- What is RCU, Fundamentally?
- What is RCU's Usage?
- The RCU API, 2010 Edition.
- What Is RCU? [PDF]
- Paul E. McKenney's RCU page.
Given this general RCU overview, we are ready to look at URCU in particular.
URCU overview
URCU is a library that is available in a number of Linux distributions
as liburcu2
(liburcu1
and
liburcu0
for older versions),
or that can be downloaded here.
An additional liburcu-dev
package is required if you
wish to develop programs that use RCU.
It runs on a number of architectures, including x86, ARM, PowerPC,
Alpha, Itanium, Mainframe, and SPARC, and also on several operating
systems, including Linux, FreeBSD, and Cygwin.
It is also expected to run on Android, NetBSD, OpenBSD, and Darwin,
but more testing is required before full support can be claimed.
Along with a number of concurrent data structures, URCU provides five different flavors of RCU for different types of applications:
- QSBR (quiescent-state-based RCU).
- Memory-barrier-based RCU.
- “Bullet-proof” RCU.
- Signal-based RCU.
- Signal-based RCU using an out-of-tree
sys_membarrier()
system call.
QSBR closely resembles the “classic” RCU implementations
(RCU_TREE
and RCU_TINY
) in the Linux kernel
in having zero-overhead implementations of rcu_read_lock()
and rcu_read_unlock()
.
However, each thread must periodically invoke
rcu_quiescent_state()
, just as in the kernel, where
schedule()
must be invoked periodically.
Each thread that is to execute RCU read-side critical sections must
also invoke rcu_register_thread()
after thread creation
and rcu_unregister_thread()
before thread exit.
These requirements clearly put a stringent constraint on the overall
application design that, for example, prohibit the use of QSBR RCU in most
library code, but in return, QSBR provides unmatched performance.
Memory-barrier-based RCU resembles the earliest
preemptible RCU implementation
accepted into the -rt patchset
in that rcu_read_lock()
and rcu_read_unlock()
contain memory barriers (though not heavyweight atomic instructions!).
This situation results in high read-side overhead by RCU standards
but, in return, removes QSBR's requirement that all threads periodically
invoke anything, thus making memory-barrier-based RCU usable from
within library functions, at least in cases where the the application
is registering its threads with RCU via
rcu_register_thread()
.
Bullet-proof RCU is like memory-barrier-based RCU but, in addition,
automatically invokes rcu_register_thread()
when needed
from the RCU read-side primitives.
This further increases these primitives' overheads, but in return
allows RCU to be used by libraries that are used by applications
that create their own threads such that RCU cannot hook into thread
creation and destruction.
Signal-based RCU removes the memory barriers from
rcu_read_lock()
and rcu_read_unlock()
,
while still allowing these primitives to be used by library
functions.
This reduces these primitives' overheads to near those of the
QSBR RCU implementation, but requires that the user application
give up a POSIX signal to be used by synchronize_rcu()
in place of the read-side memory barriers.
However, this approach requires that all threads, including those
created by library functions, invoke rcu_register_thread()
before entering their first RCU read-side critical section.
The final flavor, interrupt-based RCU using sys_membarrier()
,
uses an experimental
sys_membarrier()
system call
in place of the POSIX signals used by the preceding approach.
This sys_membarrier()
system call uses inter-processor
interrupts, which are lighter weight than POSIX signals.
In addition, in constrast to RCU_SIGNAL
, which sends a POSIX
signal to every thread in the process that has been registered as an
RCU reader, RCU_MEMBARRIER
interrupts only those threads
in this process that are currently running.
This optimization works because any blocked threads already executed
memory barriers in the process of blocking and will execute more when
they wake up.
Each of these flavors of RCU are described in the next section.
URCU flavors
There are five flavors of RCU available from liburcu, each of which can be selected on a per-compilation-unit basis as described below.
The first flavor is QSBR, which is accessed as follows:
#include <urcu-qsbr.h>
It is also necessary to include
“-lurcu-qsbr
”
on the linker command line.
QSBR provides the highest performance, but requires that all reader
threads periodically indicate quiescent states via
either rcu_quiescent_state()
for a momentary quiescent state or
using rcu_thread_offline()
/rcu_thread_online()
pairs for extended quiescent states.
Signal-based RCU is slightly slower than QSBR, but does not require
explicitly indicating quiescent states.
Instead, the application must allow RCU to have dedicated use of
SIGUSR1
.
Users who are building their own liburcu
library can
override this default by using -DSIGRCU
.
Signal-based RCU is accessed as follows:
#define RCU_SIGNAL #include <urcu.h>
It is also necessary to include
“-lurcu-signal
”
on the linker command line.
Memory-barrier-based RCU is somewhat slower than signal-based RCU, but does not require a signal to be dedicated to RCU's use. Instead, this flavor of RCU's read-side primitives contains explicit memory barriers. Memory-barrier-based RCU is accessed as follows:
#define RCU_MB #include <urcu.h>
It is also necessary to include
“-lurcu-mb
”
on the linker command line.
Some kernels have a sys_membarrier()
system call
that causes every currently running thread of the current process
to execute a memory barrier.
Systems with such a system call can gain the performance
benefits of signal-based RCU without the need to dedicate a signal
to RCU, however, until sys_membarrier()
is in mainline,
those using it will need to carefully ensure that the kernel and
user-space RCU agree on the system-call number.
This variant of RCU is accessed as follows:
#define RCU_MEMBARRIER #include <urcu.h>
It is also necessary to include
“-lurcu
”
on the linker command line.
The four previous flavors of RCU require that reader threads be
initialized using rcu_register_thread()
before the
first use of rcu_read_lock()
, and further that they
be cleaned up using rcu_unregister_thread()
after
the last use of rcu_read_unlock()
.
These requirements can be problematic in some types of libraries, and
can therefore be dispensed with (with an additional slight degradation
in performance compared to RCU_MB
) by accessing
bullet-proof RCU as follows:
#include <urcu-bp.h>
It is also necessary to include
“-lurcu-bp
”
on the linker command line.
For all flavors, defining either of the C preprocessor symbols
_LGPL_SOURCE
or
URCU_INLINE_SMALL_FUNCTIONS
before the first #include
enables static linking, which gives the compiler more freedom to
carry out optimizations, inlining in particular.
These optimizations can be particularly important for the lightweight
read-side primitives such as rcu_read_lock()
and
rcu_read_unlock()
.
In short, static linking provides increased performance for those willing
to abide by its licensing and technical constraints.
These constraints include the need to distribute “.o” object
files with any binary of your application, and the need to provide a
separate build of your application for each version of URCU's ABI.
Quick Quiz 6:
Is it OK to use URCU_INLINE_SMALL_FUNCTIONS
even in
proprietary code?
The tradeoffs between the five flavors are shown in the following table, where blank cells indicate fewer restrictions:
QSBR RCU_MEMBARRIER
RCU_SIGNAL
RCU_MB
Bullet-proof Header <urcu-qsbr.h>
<urcu.h> <urcu.h>
<urcu.h>
<urcu-bp.h>
Define RCU_MEMBARRIER
RCU_SIGNAL
RCU_MB
Libraries -lurcu-qsbr
-lurcu
-lurcu-signal
-lurcu-mb
-lurcu-bp
Read-Side Cost Free! ++ ++, test ++, smp_mb()
++, smp_mb()
, testExplicit QS Required Yes Thread Registry Required Yes Yes Yes Yes Kernel Patch Yes POSIX Signal Required Yes OK From Library Not in general OK From Library with unannounced application threads Not in general Not in general Not in general Not in general
The first three rows show the compiler directives and
linker libraries needed to access the corresponding RCU flavor.
The “Read-Side” row roughly summarizes the read-side cost of
each flavor, where “++” indicates a local increment/decrement pair,
“test” indicates an if
test and branch, and
smp_mb()
indicates a pair of memory barriers.
Thus, although read-side cost does vary, all flavors have extremely low
overhead when compared to synchronization primitives based on mutual
exclusion.
The “Explicit QS Required” row shows which flavors require
explicit calls to rcu_quiescent_state
,
the “Thread Registry Required” shows which flavors require
explicit rcu_register_thread()
and
rcu_unregister_thread()
calls from threads using RCU's
read-side primitives,
the “Kernel Patch” row shows which flavors require the
sys_membarrier()
kernel patch, and the
“POSIX Signal Required” row shows which flavors require
that the application give up a POSIX signal for RCU's use.
The “OK From Library” row shows which flavors can be
used with little or no restriction from library functions intended
to be linked with arbitrary applications, and finally,
the “OK From Library with Unannounced Application Threads”
row shows
which flavors can be used with little or no restriction from such
library functions, but where the linking application creates threads
that cannot be forced to invoke the rcu_register_thread()
and
rcu_unregister_thread()
primitives.
In short, no one of the five RCU flavors is suitable in all situations. The following rough rules of thumb can help select the flavor that is best for you:
- If you are building a standalone application, and if speed is
of the essence, then QSBR is probably your best choice.
- If you are building a library or a library function, and
you have absolutely no control over thread creation and
destruction, then bullet-proof RCU is your only viable choice.
- If you are building a library or a library function, but you
are able to hook into thread creation and destruction, then
you will need to use one of
RCU_MEMBARRIER
,RCU_SIGNAL
, orRCU_MB
as follows;- If you are running Linux and can patch your kernel,
then build your kernel with Mathieu Desnoyers's
sys_membarrier()
patch and useRCU_MEMBARRIER
. (You will also need to inform your application ofsys_membarrier()
's syscall number, for example, by patching the relevant include files.) If you build withRCU_MEMBARRIER
and run on a system lackingsys_membarrier()
, URCU will fall back toRCU_MB
. Assuming, that is, that the kernel does not contain another custom system call with a number that happens to collide with the choice ofsys_membarrier()
on the build machine. - Otherwise, if you can reserve a signal for your
library's use (perhaps in conjunction with any other
libraries or library functions wishing to use RCU),
then use
RCU_SIGNAL
. - Otherwise, use
RCU_MB
.
- If you are running Linux and can patch your kernel,
then build your kernel with Mathieu Desnoyers's
Regardless of the URCU flavor chosen, the API is the same; for example,
you will use rcu_read_lock()
regardless of the URCU flavor.
This is very convenient because it allows common source code to be shared
among different programs or libraries that might select different URCU
flavors.
However, it also means that special care is required when sharing
RCU-protected data structures among different libraries that are using
different flavors of RCU.
In some cases, it is necessary to include pointers to the relevant RCU functions
in order to avoid RCU-flavor confusion.
That said, in the typical case of a standalone application using RCU,
this problem does not arise.
URCU library API
The URCU library's API is surprisingly large, and will therefore be discussed in the following pieces:
- Atomic-operation and utility API
- Details of atomic-operation and utility API focused on memory barriers
- The URCU API
- RCU-Protected Lists
Using URCU
The following sections give some advice on the use of RCU:
When is URCU useful?
RCU is a specialized synchronization mechanism, and its performance, scalability, and real-time response advantages are due to this specialization. It is always important to use the right tool for the job, and RCU is no exception to this rule. The following diagram gives some rules of thumb for where RCU is best used:
As shown in the blue box, RCU works best if you have read-mostly data where stale and inconsistent data is permissible. The canonical example of this case in the Linux kernel is routing tables. Because it may have taken many seconds or even minutes for the routing updates to propagate across the Internet, the system has been sending packets the wrong way for quite some time. Having some small probability of continuing to send some of them the wrong way for a few more milliseconds is almost never a problem. User-level classification and routing data structures are likely to be just as amenable to RCU as those in the Linux kernel.
If you have a read-mostly workload where consistent data is required, RCU works well, as shown by the green box. Examples of this situation are data structures that map from a key to a structure, but where operations on each such structure must be fully synchronized, especially including deletion of the structure. The trick here is to use RCU to protect the mapping data structure and to use a per-structure lock to protect the individual structures being mapped to. In addition, each structure has a flag indicating that whether or not it has been deleted. When an RCU-protected search finds a data structure, it acquires that data structure's lock and checks the deletion flag. If the flag is set, the search pretends not to have found the structure, otherwise, the desired operation on the structure is carried out under the protection of the lock.
In this case, RCU provides high performance and scalability for the mapping operation, and also guarantees that the structure will remain in existence: Updaters must wait for a grace period after removing a given structure before they are permitted to free that structure.
As indicated by the yellow box, RCU can also be useful for read-write workloads where consistent data is required, although usually in conjunction with a number of other synchronization primitives, including per-data-structure locks (as above), per-thread locks, sequence locks, and atomic operations.
Finally, as indicated by the red box, update-mostly workloads requiring consistent data are rarely good places to use RCU. As noted in the figure, RCU's existence guarantees can be useful even in update-heavy situations because this allows simpler per-structure locking designs to be constructed, as is in fact the case for lfstack and rculfqueue—even though these data structures typically only have operations that update the structure itself. In effect, RCU is providing type-safe memory for algorithms using non-blocking synchronization. In addition, the low latency of RCU readers can be important to real-time applications in cases where the real-time threads do only reads, and non-real-time threads do updates.
So, in cases where RCU is the right tool for the job, what benefits does it provide?
- Excellent performance and scalability for RCU readers.
- Wait-free RCU readers, resulting in determinism, which can be
quite useful for real-time applications.
- Read-side immunity to lock-based deadlocks.
- Existence guarantees that can simplify design of algorithms
that use per-data-structure locks.
- RCU read-side critical sections can be freely nested, enabling
high levels of composability.
- Because RCU read-side critical sections are not subject to
rollback and retry, they can contain irrevocable operations
such as I/O operations (in contrast to many
transactional-memory implementations).
- RCU readers and writers can make concurrent forward progress,
even in the presence of conflicting operations.
In contrast, many other synchronization operations impose
some flavor of mutual exclusion, so that conflicts result
in spinning, blocking, rollbacks, or retries.
- RCU is independent of choice of memory allocator because it does not need to distinguish between pointers to the beginning of a given object and pointers internal to that object (in contrast with hazard pointers).
In short, within its area of applicability, RCU is a useful and powerful tool.
Quick Quiz 7: You say “read-side immunity to lock-based deadlocks”. So what non-lock-based deadlocks can involve RCU readers?Quick Quiz 8: But given that most of the URCU implementations count the read-side nesting level, composability cannot be perfect, can it?
Application quiescent states
Kernels can use a context switch as a quiescent state, and many applications have similar natural quiescent states. However, just as RCU has hooks into the Linux scheduler in order to determine when context switches happen, applications using the QSBR flavor of URCU must tell RCU where those naturally occurring quiescent states are located using the rcu_quiescent_state(), rcu_thread_offline(), and rcu_thread_online() functions. For example, consider an application with a work-processing loop, such as show below:
Examples of such applications include those using work-stealing task frameworks. Each thread of this application processes one work item on each pass through its loop, then fetches the next work item. Because RCU read-side critical sections will not span units of work, a natural quiescent state occurs when each work item completes execution, as shown on the right-hand side of the diagram. The code represented by the green boxes can contain RCU read-side critical sections.
Quick Quiz 9:
Can the code in the white rcu_quiescent_state()
box
contain RCU read-side critical sections?
In the typical work-stealing task framework, there is almost
always work immediately available until the application terminates.
In contrast, event-based systems might wait an arbitrary length of
time for the next work item to arrive.
URCU clearly needs to understand that a thread waiting on work to arrive
is in a quiescent state, and the rcu_thread_offline()
and
rcu_thread_online()
functions allow the application to
tell URCU just that, as shown below:
The application executes rcu_thread_offline()
after
executing each work item, and then executes rcu_thread_online()
immediately before starting execution of the next work item.
Therefore, RCU read-side critical sections are permitted when executing
a work item, but prohibited while waiting for the next work item.
In this way, the rcu_quiescent_state()
,
rcu_thread_offline()
, and rcu_thread_online()
allow the application to communicate the pertinent details of that
application's structure to RCU.
Quick Quiz 10: But what if some portions of the “Wait For Next Work Item” code needed to use RCU read-side critical sections?
However, please note that these three functions are necessary only
for the QSBR flavor of RCU.
In the other four flavors, RCU is instead explicitly notified of the
beginning and end of each RCU read-side critical section, so that
the rcu_quiescent_state()
,
rcu_thread_offline()
, and rcu_thread_online()
functions are no-ops.
Therefore, if your application does not have convenient quiescent states,
you should use one of the other four RCU flavors.
Mixing and matching URCU flavors
Different compilation units can be using different flavors of RCU concurrently. This works extremely well when different modules or libraries are using RCU to access internal data structures: Each such module or library selects its flavor of RCU and uses it, and the multiple flavors operate independently as desired. One use case for this behavior is a tracer library for user-space applications and libraries (LTTng-UST), which in turn must use RCU-bp because it can neither control the application's threads nor even hook into their creation and destruction. This use case is supported quite naturally.
That said, it is necessary to take some care when passing RCU-protected data structures from one compilation unit to another. In some cases, it may be necessary to wrap the RCU functions in order to ensure that they are compiled in the correct compilation unit. In other cases, including the hash table described here, it is necessary to maintain function pointers to the selected RCU flavor's API members in order to allow the data structure to be safely manipulated by other compilation units that neither know or care what RCU flavor the data structure is using.
Building URCU
A number of Linux distributions provide the URCU library, including the Debian family (e.g., Ubuntu), Fedora, openSUSE, SUSE Enterprise RT Linux, Linaro, and all Yocto-based RT distros (e.g., Wind River Linux, Mentor Embedded Linux, Montavista, and ElinOS). However, if you want features and data structures from the latest version of the URCU library, you will need to download and build it.
The URCU library may be downloaded here. It can also be cloned or viewed using:
git://git.liburcu.org/urcu.git https://github.com/urcu/userspace-rcu
Once downloaded, follow the build instructions in the README
.
Summary
The user-space RCU library has been in existence for almost four
years, and has grown from being just RCU to also including RCU-protected
data structures, including a resizable hash table, a stack, and a couple
of queues.
We expect that more data structures will make their appearance over time,
with possibilities including
Judy
arrays [PDF],
bonsai
trees [PDF],
and many more besides.
Other hoped-for enhancements include improved IPC
(user space wait_event()
and wake_up()
, anyone?)
and, of course, a locking validator similar to the Linux kernel's lockdep.
This library has been used by a number of projects, including LTTng-UST (User-Space Tracer) (for which it was first constructed), Knot DNS, the gdnsd geographic DNS server, the netsniff-ng toolkit, the Sheepdog Project, and an academic project on crowd simulation, as well as a number of commercial applications.
Acknowledgments
We are grateful to Jérémie Galarneau, Josh Triplett, Frédéric Weisbecker for their comments on a draft of this article. We owe thanks to Jim Wasko for his support of this effort.
Answers to Quick Quizzes
Quick Quiz 1:
Why use rcu_assign_pointer()
rather than the simple
gptr=tmp
assignment statement that the C language provided for this
straightforward job???
Answer:
Unfortunately, this job is not so straightforward.
Remember that readers might be accessing gptr
at any time:
There is no way to stop them.
We therefore need to prevent both the compiler and the CPU from
mischievously reordering the memory references, which they are both
otherwise fully within their rights to do.
To see why this is important, suppose that either the compiler or the
CPU decided to optimize the code by moving the gptr=tmp
assignment statement so that it preceded the structure-initialization
assignment statements.
Then the concurrent readers might well load from gptr
and
get a pointer to a structure full of pre-initialization garbage.
Not good!!!
Therefore, we use rcu_assign_pointer(gptr, tmp)
, which acts as
a mischief-preventing assignment statement.
Quick Quiz 2:
Why cds_list_del_rcu()
instead of simply
list_del_rcu()
like the Linux kernel does?
Answer:
We could not take the Linux kernel list primitives directly due to
license issues, namely the Linux kernel's GPLv2 license as opposed
to the user-space RCU library's LGPLv2.1-or-later license.
Instead, the GNU C library's list primitives were pressed into service.
There are a few differences from the kernel, and there will be more
as the kernel's API evolves, so the cds_
prefix serves as a helpful reminder.
In addition, names like list_add()
and list_del()
are already used by quite a few user applications,
and the cds_
prefix helps avoid symbol-name clashes.
Quick Quiz 3:
And just how is synchronize_rcu()
supposed to know when
all the currently executing RCU readers started?
Doesn't that mean that all the RCU readers must do heavyweight operations
to get the current time and store it somewhere?
And doesn't this also require perfect time synchronization?
Answer:
There are a number of ways for synchronize_rcu()
to do its
job, for example using the “Classic RCU” algorithm
described
here.
Heavyweight read-side operations and perfect time synchronization
are not required, although in-kernel RCU implementations have been
known to complain when CPUs' notions of the current time differ
by more than one minute.
Quick Quiz 4: Why the “reader” above? Doesn't every thread need reside in a quiescent state in order for a grace period to have elapsed?
Answer:
The URCU Library API section
introduces the rcu_register_thread()
and
rcu_unregister_thread()
APIs.
A reader thread is a thread that has invoked
rcu_register_thread()
and not yet invoked
rcu_unregister_thread()
.
All other threads are forbidden from executing RCU read-side
critical sections and are ignored by RCU.
Quick Quiz 5: When does RCU-protected list traversal compile to different assembly language than does simple single-threaded linked-list traversal?
Answer:
In some flavors of user-space RCU, rcu_read_lock()
and rcu_read_unlock()
produce some lightweight code
(more on this later).
In all flavors of user-space RCU, cds_list_for_each_entry_rcu()
uses the CMM_ACCESS_ONCE()
macro
(similar to
ACCESS_ONCE() in the
Linux kernel),
which can disable some code-motion compiler optimizations.
Finally, when running on DEC Alpha,
cds_list_for_each_entry_rcu()
contains a memory barrier.
Quick Quiz 6:
Is it OK to use URCU_INLINE_SMALL_FUNCTIONS
even in
proprietary code?
Answer:
Yes, it is.
Use of URCU_INLINE_SMALL_FUNCTIONS
inlines only small
functions and macros that fit within LGPL's 10-line inlining limit
for non-LGPL code.
See paragraph 4 of clause 5 of LGPL v2.1, which reads:
If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.)
Quick Quiz 7: You say “read-side immunity to lock-based deadlocks”. So what non-lock-based deadlocks can involve RCU readers?
Answer: Any sequence of events that can result in an RCU reader waiting for an RCU grace period results in deadlock in all RCU flavors except for QSBR. In QSBR, this bug results in splitting of the RCU read-side critical section containing the operation that waits on a grace period. Note that acquiring a lock that is held across a grace period can indirectly result in waiting for a grace period. There are of course a number of more elaborate ways of arriving at this sort of deadlock.
Quick Quiz 8: But given that most of the URCU implementations count the read-side nesting level, composability cannot be perfect, can it?
Answer: Indeed it cannot. This is an imperfect world, beset by finite-sized counters. And this problem is not restricted to RCU: Other composable synchronization operations have implementation-defined limits to nesting depth, and thus limits on composability.
Quick Quiz 9:
Can the code in the white rcu_quiescent_state()
box
contain RCU read-side critical sections?
Answer: That is a matter that is internal to the RCU implementation. That said, any RCU read-side critical sections must be outside the area of code where RCU is ignoring this thread.
Quick Quiz 10: But what if some portions of the “Wait For Next Work Item” code needed to use RCU read-side critical sections?
Answer:
Then the calls to rcu_thread_offline()
and
rcu_thread_online()
would need to be moved into
the “Wait For Next Work Item” code.
Brief items
Quote of the week
PyDev 3.0 released
Version 3.0 of PyDev has been released, supporting Eclipse 3.7 or 4.3 and Java 7. Highlights include support for the Computational Crystallography Toolbox (CCTBX) and IPython 1.0, synchronization between the PyDev and Python interpreters, and numerous changes to the debugger, interpreter, and interactive console.
Twisted 13.2.0 available
Version 13.2 of the Twisted framework has been released. This version adds support for " Version 1.8.0 of the NumPy Python extension for mathematical functions has been released. The release merges Python 2 and Python 3 support into a unified code base. Included among the changes are linear algebra functions on stacked arrays, functions to create value initialized arrays, and statistical functions that automatically skip NaNs.
a HostnameEndpoint implementation which uses IPv4 and IPv6 in parallel, speeding up the connection by using whichever connects first (the 'Happy Eyeballs'/RFC 6555 algorithm)
", as well as featuring recent contributions from Google Summer of Code and Outreach Program for Women students.
NumPy 1.8.0 released
Newsletters and articles
Development newsletters from the past week
- Caml Weekly News (November 12)
- What's cooking in git.git (November 6)
- What's cooking in git.git (November 11)
- What's cooking in git.git (November 13)
- Haskell Weekly News (November 6)
- Haskell Weekly News (November 12)
- Perl Weekly (November 10)
- PostgreSQL Weekly News (November 10)
- Ruby Weekly (November 7)
- Tor Weekly News (November 13)
Page editor: Nathan Willis
Next page:
Announcements>>