The basic limitation I see with this is a complete write cache flush is not close to optimal on sufficiently intelligent devices. That is where I/O threads come in. The complaint is made that a write barrier makes it difficult to re-order write requests in the block subsystem. There is an easy solution to that: make the write barrier only apply to one I/O thread.
Then the block layer can reorder requests arbitrarily as long as the I/O thread specific ordering constraints are met. And when a thread specific write barrier needs to be issued to a lower level device, it can be translated to a device level write barrier instead of issuing a full write cache flush, without impairing the higher level block reordering at all.
In addition, intelligent I/O thread supporting block devices could use the I/O thread specific barrier operation to order just the requests on that I/O thread instead of flushing its entire write cache. iSCSI theoretically supports this now if you map I/O threads to connections (initiator target nexuses), although the SCSI architectural model doesn't seem to be designed to do it that way.
However (with proper kernel support) a block layer protocol such as that used by DRBD could presumably easily be adapted to support I/O thread specific barrier operations, allowing the remote device to maintain a much larger write cache, one that doesn't need to be flushed every time a process calls fsync, or even forced to order all its block operations to satisfy the ordering constraints of just one I/O thread. Provided there was a way to flush the cache for just one I/O thread, or (even better) be notified when the writes prior to an I/O thread specific barrier had completed, such support would reduce fsync latency on a busy device dramatically.
It is not as if one was writing a distributed database protocol he would request a remote database node to flush every dirty block in its cache to ensure a commit. Rather a request would be sent to commit just the ones associated with a certain transaction. I don't see why a distributed block device should be any different, and I/O threads are a simple way of making that happen. I/O threads or something like them are the future of block devices. A full write cache flush makes the BKL look like an exercise in efficiency.