Posted May 15, 2012 0:55 UTC (Tue) by dlang (✭ supporter ✭, #313)
In reply to: A bcache update by koverstreet
Parent article: A bcache update
>> It would also seem that the best way to run such a cache would be with a relatively tight integration with the raid layer, such that backing store devices are marked with metadata indicating the association with the cache device, so that the group of them can properly be reassembled after a hardware failure, possibly on a different system. The raid layer could then potentially use the cache as a write intent log as well, which is a big deal for raid 5/6 setups.
>I think if you're using bcache for writeback caching on top of your raid5/6,
things will work out pretty well without needing any tight integration;
bcache's writeback tries hard to gather up big sequential IOs, and raid5/6 will handle those just fine.
Where tight integration would be nice would be if bcache can align the writeback to the stripe size and alignment the same way that it tries to align the SSD writes to the eraseblock size and alignment.
Posted May 15, 2012 3:43 UTC (Tue) by koverstreet (subscriber, #4296)
[Link]
For that we wouldn't really need tight integration - you could conceivably just tell bcache the stripe size of the backing device, and it wouldn't consider partial stripes sequential with full stripes.
But for that to be useful we'd have to have writeback specifically pick bigger sequential chunks of dirty data and skip smaller ones, and I'm not sure how useful that actually is. Right now it just flushes dirty data in sorted order, which makes things really easy and works quite well in practice - in particular for regular hard drives, even if your dirty data isn't purely sequential you're still minimizing seek distance. And if you let gigabytes of dirty data buffer up (via writeback_percent) - the dirty data's going to be about as sequential as it's gonna get.
But yeah, there's all kinds of interesting tricks we could do.
A bcache update
Posted May 15, 2012 22:34 UTC (Tue) by intgr (subscriber, #39733)
[Link]
> But for that to be useful we'd have to have writeback specifically pick bigger sequential chunks of dirty data and skip smaller ones, and I'm not sure how useful that actually is
That's certainly an important optimization for mixed random/sequential write workloads whose working set is larger than the SSD. To make best use of both kinds of disks, random writes should persist on the SSD as long as possible, whereas longer sequential writes should be pushed out quickly to make more room for random writes.
A bcache update
Posted May 16, 2012 1:52 UTC (Wed) by koverstreet (subscriber, #4296)
[Link]
Well, you want the sequential writes to bypass the cache, which is what bcache does.
If you can show me a workload that'd benefit though - I'm not opposed to the idea, it's just a question of priorities.
A bcache update
Posted May 16, 2012 7:35 UTC (Wed) by intgr (subscriber, #39733)
[Link]
> Well, you want the sequential writes to bypass the cache, which is what bcache does
Oh, I didn't realize that. That should take care of most of it. There's probably still some benefit to sorting writeback by size, but I'm not sure whether it's worth the complexity.