Making write barriers actually work
[Posted October 15, 2003 by corbet]
Certain kernel subsystems - journaling filesystems in particular - have some
strict requirements about how their disk I/O operations are ordered. Open
transactions must be committed to the journal before the actual filesystem
structure can be touched. If this requirement is not met, the integrity of
the filesystem could be lost if a crash happens at the wrong time.
One way to implement ordering is to explicitly wait on the buffers that
must make it to disk. If no new operations are submitted before the old
ones complete, the ordering requirements will be met (though write caching
in disk drives can create problems of their own). This waiting is hard on
performance, however; the filesystem would be better off setting up more
requests than waiting for the old ones.
As a way of improving journaling filesystem performance, the design goals for
the block layer rework in 2.5 included write barriers. A write barrier is
simply a specially marked I/O request; the block layer will not reorder any
other request past a barrier request in either direction. In this way, all
requests issued prior to the barrier request are guaranteed to be completed
before any requests issued after the barrier are begun. With this feature,
a journaling system can simply issue a barrier request when it commits its
journal, then go on with implementing the next transaction.
The problem is that barriers don't actually work yet. That little
shortcoming shouldn't last much longer, however, now that Jens Axboe has dusted off his write barrier patch and is
actively working on it again.
Barrier requests still work pretty much as described in the LWN Driver Porting series. A driver which
honors barriers must now inform the block layer of that fact, however, with
a call to:
void blk_queue_ordered(request_queue_t *queue, int flag);
where flag is QUEUE_ORDERED_NONE if the device does not
support barriers (the default), QUEUE_ORDERED_TAG if barriers are
implemented with ordered command tags, or QUEUE_ORDERED_FLUSH if
an explicit hardware flush command is used. If higher-level code attempts
to create a barrier request for a device which does not support them, the
block layer will return an error.
The code does not currently appear to care which of the two methods a
driver says it implements, as long as it picks one.
Also included with the patch is a barrier implementation for IDE drives
(using QUEUE_ORDERED_FLUSH) and simple patches to a couple of
filesystems to make them use the barrier feature. Now it's mostly a matter
of waiting to see whether Linus considers barriers to be a
stability-related patch.
(
Log in to post comments)