A bcache update

Posted May 16, 2012 18:14 UTC (Wed) by butlerm (subscriber, #13312)
Parent article: A bcache update

Next question. Considering that there is a write back cache on persistent media, does that mean when the block layer issues a write cache flush (force unit access) command, that bcache synchronously forces the blocks to SSD, but _not_ synchronously to the backing store device?

Assuming the cache is reliable and persistent, that is exactly what you want in most situations. If a higher level cache flush command actually forces a cache flush of a persistent write back cache, write back mode will be approximately useless. You might as well just leave it in RAM.

A bcache update

Posted May 17, 2012 1:29 UTC (Thu) by koverstreet (✭ supporter ✭, #4296) [Link] (1 responses)

The problem is that if you haven't been writeback caching _everything_ - i.e. your sequential writes have been bypassing the cache - you still need that cache flush to the backing device.

If a mode where all writes were writeback cached and cache flushes were never sent to the backing device would be useful, it'd be trivial to add.

A bcache update

Posted May 17, 2012 6:30 UTC (Thu) by butlerm (subscriber, #13312) [Link]

This is why the block layer really ought to support write "threads" with thread specific write barriers. It is trivial to convert a barrier to a cache flush where it is impractical to do something more intelligent, but thread specific barriers are ideal for fast commits of journal entries and the like.

In a typical journalling filesystem, it is usually only the journal entries that need to be synchronously flushed at all. Most writes can be applied independently. If a mounted filesystem had two or more I/O "threads" (or flows), one for journal and synchronous data writes, and one for ordinary data writes, an intelligent lower layer could handle a barrier on one thread by flushing a small amount of journal data, while the other one takes its own sweet time - with a persistent cache, even across a system restart if necessary.

Otherwise, the larger the write cache, the larger the delay when a journal commit comes along. Call it a block layer version of buffer bloat. As with networking, other than making the buffer smaller, the typical solution is to use multiple class or flow based queues. If you don't have flow based queuing, you really don't want much of a buffer at all, because it causes latency to skyrocket.

As a consequence, I don't see how write back caching can help very much here, unless all writes (or at least meta data for all out of order writes) are queued in the cache, so that write ordering is not broken. Am I wrong?