Posted May 14, 2012 23:25 UTC (Mon) by butlerm (subscriber, #13312)
Parent article: A bcache update
My number one question is does bcache preserve the (safely written) contents of the cache across restarts? That would be a major advantage in avoiding cache warm up time in many applications, particularly large databases.
In addition, clearing out the first block of the backing device doesn't seem like such a great idea on simple configurations. If the SSD fails, the remaining filesystem would practically be guaranteed to be in a seriously inconsistent state upon recovery. I would be concerned about using this in a sufficiently critical setup without the ability to mirror the cache devices.
It would also seem that the best way to run such a cache would be with a relatively tight integration with the raid layer, such that backing store devices are marked with metadata indicating the association with the cache device, so that the group of them can properly be reassembled after a hardware failure, possibly on a different system. The raid layer could then potentially use the cache as a write intent log as well, which is a big deal for raid 5/6 setups.
If you want write performance improvements, the ideal thing to do is not to synchronously flush dirty data in the cache to the backing store on receipt of a write barrier, but rather to journalize all the writes themselves on the cache device, so that they can be applied in the correct order on system recovery. Otherwise you get a milder version of problem where synchronous writes are held up by asynchronous ones, which can dramatically affect the time required for an fsync operation, a database commit, and other similar operations, if there are other asynchronous writes pending.
On the other hand, a device like this could make it possible for the raid / lvm layers to handle write barriers properly, i.e. with context specific barriers, that only apply to one logical volume, or subset of the writes from a specific filesystem, without running the risk of serious inconsistency problems due to the normal inability to write synchronously to a group of backing store devices. If you have a reliable, low overhead write intent log, you can avoid that problem, and do it right, degrading to flush the world mode when a persistent intent log is not available.