LWN.net Logo

Making write barriers actually work

Certain kernel subsystems - journaling filesystems in particular - have some strict requirements about how their disk I/O operations are ordered. Open transactions must be committed to the journal before the actual filesystem structure can be touched. If this requirement is not met, the integrity of the filesystem could be lost if a crash happens at the wrong time.

One way to implement ordering is to explicitly wait on the buffers that must make it to disk. If no new operations are submitted before the old ones complete, the ordering requirements will be met (though write caching in disk drives can create problems of their own). This waiting is hard on performance, however; the filesystem would be better off setting up more requests than waiting for the old ones.

As a way of improving journaling filesystem performance, the design goals for the block layer rework in 2.5 included write barriers. A write barrier is simply a specially marked I/O request; the block layer will not reorder any other request past a barrier request in either direction. In this way, all requests issued prior to the barrier request are guaranteed to be completed before any requests issued after the barrier are begun. With this feature, a journaling system can simply issue a barrier request when it commits its journal, then go on with implementing the next transaction.

The problem is that barriers don't actually work yet. That little shortcoming shouldn't last much longer, however, now that Jens Axboe has dusted off his write barrier patch and is actively working on it again.

Barrier requests still work pretty much as described in the LWN Driver Porting series. A driver which honors barriers must now inform the block layer of that fact, however, with a call to:

    void blk_queue_ordered(request_queue_t *queue, int flag);

where flag is QUEUE_ORDERED_NONE if the device does not support barriers (the default), QUEUE_ORDERED_TAG if barriers are implemented with ordered command tags, or QUEUE_ORDERED_FLUSH if an explicit hardware flush command is used. If higher-level code attempts to create a barrier request for a device which does not support them, the block layer will return an error. The code does not currently appear to care which of the two methods a driver says it implements, as long as it picks one.

Also included with the patch is a barrier implementation for IDE drives (using QUEUE_ORDERED_FLUSH) and simple patches to a couple of filesystems to make them use the barrier feature. Now it's mostly a matter of waiting to see whether Linus considers barriers to be a stability-related patch.


(Log in to post comments)

Making write barriers actually work

Posted Oct 16, 2003 8:51 UTC (Thu) by daniel (subscriber, #3181) [Link]

Hi Jon,

"Now it's mostly a matter of waiting to see whether Linus considers barriers to be a stability-related patch."

Stranger things have happened, however it is clearly a performance patch.

Regards,

Daniel

Re: Making write barriers actually work

Posted Oct 16, 2003 9:41 UTC (Thu) by axboe (subscriber, #904) [Link]

Hmm odd, I consider it mainly a data integrity patch. There are no performance gains in the version I sent out, but it sure is a lot safer. The fact that you can get performance gains as well is just an extra future bonus.

Re: Making write barriers actually work

Posted Oct 16, 2003 11:02 UTC (Thu) by daniel (subscriber, #3181) [Link]

"I consider it mainly a data integrity patch."

For correctness it is only essential that barrier requests are properly failed for devices that do not support them and that queues properly indicate that barriers are not supported.

We've gotten this far without barriers, we could possibly manage to wait for 2.6.1 :-)

Re: Making write barriers actually work

Posted Oct 17, 2003 10:21 UTC (Fri) by axboe (subscriber, #904) [Link]

For correctness, it's is essential that data is on disk when the fs thinks it is. And right now it isn't. I think the problem is bigger than you think. Just be glad your power supply is stable :-)

But please use your 'we' carefully, say 'I' if you are just referring to yourself. At least SuSE and EMC cares enough about customer data integrity to have been using the patches for a long time on the 2.4 base.

Re: Making write barriers actually work

Posted Oct 20, 2003 13:32 UTC (Mon) by daniel (subscriber, #3181) [Link]

"But please use your 'we' carefully, say 'I' if you are just referring to yourself. At least SuSE and EMC cares enough about customer data integrity to have been using the patches for a long time on the 2.4 base."

But you are still mixing up performance with integrity. For integrity you only need these four lines:

+ if (barrier && (q->ordered == QUEUE_ORDERED_NONE)) {
+ err = -EOPNOTSUPP;
+ goto end_io;
+ }

Neither QUEUE_ORDERED_TAG or QUEUE_ORDERED_FLUSH are ever set in mainline.
Of course we always like more performance and more kernel tweaking, however there are good reasons to respect the rules of a freeze.

And I'll stick with the "we" thanks. Vendors are perfectly happy to continue to distinguish their offerings by carrying performance patches, whereas the rest of us are probably more interested in an orderly march to 2.6.0.

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds