|| ||Tejun Heo <firstname.lastname@example.org>|
|| ||email@example.com, firstname.lastname@example.org|
|| ||[PATCH linux-2.6 00/04] brsem: [RFC] big reader semaphore|
|| ||Sun, 25 Sep 2005 15:43:07 +0900 (KST)|
This patchset implements brsem - big reader semaphore. This is
similar to the late big reader lock in its concept. Reader side tries
to do as little as possible while writer side takes over all the
overhead. brsem is to be used for cases where...
a. writer operation is *extremely* rare compared to read ops
b. both readers and writers should be able to sleep
c. stale data or reader-side retry is impossible or impractical
Device or subsystem hotplug/unplug synchronization fits very well to
above criteria. Plugging and unplugging occur very rarely but they
still need stringent synchronization.
brsem is implemented while trying to solve the following race
condition in the vfs layer.
super->s_umount rwsem proctects filesystem unmounts and remounts
against other operations. When remounting a rw filesystem ro, after
write-locking s_umount, do_remount_sb() is invoked, which scans all
open files and either refuses to remount or bang all write-open files
depending on force argument. After that, the filesystem is remounted
The problem is that a file can be opened rw after all open files are
scanned but before the filesystem is remounted ro. This can be easily
exposed by adding msleep(2000) between open file scan and remount_fs()
and doing the following in the user space. An ext2 fs is mounted on
# mount -o remount,ro tmp& sleep 1; cat >> tmp/asdf
This leaves us with a write-opened file on a ro filesystem. ext2
happily writes data to disk. I've only verified open(2) but this race
probably exists for other file operations, too.
I'll post a patch to add 2s sleep during remounting and test results
in a reply to this mail.
The solution is to make permission check and the rest of open atomic
against remounting by read-locking s_umount rwsem. This probably
should be done for most file operations including write(2) (maybe even
when committing dirty pages to fs for rw-mmapped files in 'force'
case). This is an expensive overhead to pay for such rare cases and
seqlock / rcupdate cannot be used here.
So, brsem is implemented. Read locking and unlocking involve only
local_bh_disable/enable, a few local arithmetics and conditionals - no
atomic ops, no shared word, no memory barrier. Writer operations are
*very* expensive. They have to issue workqueue works to each cpu
using smp_call_function and wait for them.
This patchset also converts cpucontrol semaphore which protects cpu
hotplug/unplugging events, which obviously occur very rarely.
[ Start of patch descriptions ]
: implement big reader semaphore
This patch implements big reader semaphore - a rwsem with very
cheap reader-side operations and very expensive writer-side
operations. For details, please read comments at the top of
: convert super_block->s_umount to brsem
Convert super_block->s_umount from rwsem to brsem.
: fix ro-remount <-> open race condition
A file can be opened rw while ro-remounting is in progress
resulting in open rw file on ro mounted filesystem. This
patch kills the race by making permission check and the rest
of open atomic w.r.t. remounting. Other file operations also
have this race condition and should be fixed in similar
: convert cpucontrol to brsem
cpucontrol synchronizes cpu hotplugging and used to be a
semaphore. This patch converts it to brsem.
[ End of patch descriptions ]