The end of block barriers

Posted Aug 19, 2010 8:09 UTC (Thu) by butlerm (subscriber, #13312)
Parent article: The end of block barriers

The basic limitation I see with this is a complete write cache flush is not close to optimal on sufficiently intelligent devices. That is where I/O threads come in. The complaint is made that a write barrier makes it difficult to re-order write requests in the block subsystem. There is an easy solution to that: make the write barrier only apply to one I/O thread.

Then the block layer can reorder requests arbitrarily as long as the I/O thread specific ordering constraints are met. And when a thread specific write barrier needs to be issued to a lower level device, it can be translated to a device level write barrier instead of issuing a full write cache flush, without impairing the higher level block reordering at all.

In addition, intelligent I/O thread supporting block devices could use the I/O thread specific barrier operation to order just the requests on that I/O thread instead of flushing its entire write cache. iSCSI theoretically supports this now if you map I/O threads to connections (initiator target nexuses), although the SCSI architectural model doesn't seem to be designed to do it that way.

However (with proper kernel support) a block layer protocol such as that used by DRBD could presumably easily be adapted to support I/O thread specific barrier operations, allowing the remote device to maintain a much larger write cache, one that doesn't need to be flushed every time a process calls fsync, or even forced to order all its block operations to satisfy the ordering constraints of just one I/O thread. Provided there was a way to flush the cache for just one I/O thread, or (even better) be notified when the writes prior to an I/O thread specific barrier had completed, such support would reduce fsync latency on a busy device dramatically.

It is not as if one was writing a distributed database protocol he would request a remote database node to flush every dirty block in its cache to ensure a commit. Rather a request would be sent to commit just the ones associated with a certain transaction. I don't see why a distributed block device should be any different, and I/O threads are a simple way of making that happen. I/O threads or something like them are the future of block devices. A full write cache flush makes the BKL look like an exercise in efficiency.

The end of block barriers

Posted Aug 19, 2010 11:31 UTC (Thu) by zmi (guest, #4829) [Link] (2 responses)

You partly answered a question I had to the article: What makes the new algorithm that much faster than barriers? When I still send a "cache must be flushed" request to the device, even normal writes are stopped until the device flush it's cache, and it will take some time to fill the cache again to get good performance, during that another flush might incur...

Think of a simple file server, where 3 persons copy files to:
Person 1 copies his MP3 collection to the server.
Person 2 copies her HD-video collection to the server.
Person 3 runs a highly parallel intensive write database.

While for 1 and especially 2 there will be very low metadata operations, 3 generates lots of fsync's. From the article I read that a full cache flush would happen, suspending even the normal writes of person 1+2. Or am I misinterpreting that?

The end of block barriers

Posted Aug 19, 2010 13:13 UTC (Thu) by corbet (editor, #1) [Link] (1 responses)

The cache flush can happen in parallel with other I/O. Forcing specific blocks to persistent media can only slow things down, but they have to get there soon in any case. While the drive is executing the cache flush, it can be satisfying other requests whenever it's convenient. "Cache flush" doesn't mean "write only blocks in the cache" or "don't satisfy outstanding reads while you're at it". It will be far more efficient than a full queue drain.

The end of block barriers

Posted Aug 19, 2010 15:25 UTC (Thu) by butlerm (subscriber, #13312) [Link]

The problem is that in general an fsync requires a journal commit. If you have to flush the entire write cache throughput for non-fsync-serialized threads might be fine, but the performance of threads that call fsync will seriously suffer due to the delay.

Flushing the write cache of the device a lot also moves the request ordering / merging efficiency problem down a level. If the device must flush the entire write cache on a regular basis, it has much reduced opportunity to order write operations as optimal for that device.