LWN.net Logo

Facebook releases Flashcache

Facebook has released a kernel module called Flashcache that it uses to speed up MySQL by caching data in SSD disks. The code is available on Github, but only tested for kernel versions 2.6.18 and 2.6.20. "We built Flashcache to help us scale InnoDB/MySQL, but it was designed as a generic caching module that can be used with any application built on top of any block device. For InnoDB, when the working set does not fit in the InnoDB buffer pool, read latency is significantly improved due to caching more of the working set in faster media, such as SSD's. We also improve write performance by first caching writes in SSD's and lazily flushing the data back to disk." (Thanks to Ray Van Dolson.)
(Log in to post comments)

Facebook releases Flashcache

Posted Apr 29, 2010 22:15 UTC (Thu) by jstultz (subscriber, #212) [Link]

Looks like its based on the dm-cache work. Very cool to see that moving again. http://users.cis.fiu.edu/~zhaom/dmcache/index.html

Facebook releases Flashcache

Posted Apr 30, 2010 1:20 UTC (Fri) by paragw (guest, #45306) [Link]

For a moment I thought now that Adobe is done exploiting GPU acceleration, next step was to explore kernel modules to hide the CPU usage and slow down the fans.

Facebook releases Flashcache

Posted Apr 30, 2010 15:07 UTC (Fri) by ThinkRob (subscriber, #64513) [Link]

Either that, or a clever L2-based monetization ploy...

Facebook releases Flashcache

Posted Apr 30, 2010 6:48 UTC (Fri) by butlerm (guest, #13312) [Link]

What I want to know is what recovery mechanisms there are for flushing all the delayed writes from the flash cache to the actual media after a system crash. No one in his or her right mind would use a writeback cache without them, right?

Facebook releases Flashcache

Posted Apr 30, 2010 7:40 UTC (Fri) by ledow (guest, #11753) [Link]

Looks like you just activate the same cache device again as you would on any normal boot and it would carry on from where it left off. You might lose a block or two of data that was unwritten but that's where things like journalling filesystems / transactional databases make MUCH more difference than what the writeback cache does.

Facebook releases Flashcache

Posted Apr 30, 2010 8:28 UTC (Fri) by fperrin (subscriber, #61941) [Link]

It depends on the consequences of losing a couple of transactions. If you're Facebook and you loose a dozen of status updates or wall messages, does it really matter a lot? Of course, if you're NYSE and you loose some transactions, your users won't be very happy.

Facebook releases Flashcache

Posted Apr 30, 2010 12:00 UTC (Fri) by rstreeks (subscriber, #1018) [Link]

We all instinctively think that the lost NYSE transactions, are more important because of the $ value, then face-book updates. But to the end user there is no difference. Both groups get upset about lost transactions. With one you can a easily but a $ value on it but the other one you can't.
In both situations you have to do a lot of damage control.

Facebook releases Flashcache

Posted Apr 30, 2010 13:03 UTC (Fri) by cowsandmilk (subscriber, #55475) [Link]

Facebook doesn't make guarantees to users about performing transactions, and that's part of the point. The more 9's you require in reliability, the more expensive it is computationally. Facebook makes a system that gives enough reliability to make users happy. Occasional double posts or lost posts will be viewed by most users as their own error. If the NYSE doubles my order, I'm going to be pissed off. So, this system might not have the reliability for stock exchanges, but that's not what it was written for...

Facebook releases Flashcache

Posted May 6, 2010 4:52 UTC (Thu) by butlerm (guest, #13312) [Link]

"It depends on the consequences of losing a couple of transactions"

A well designed persistent writeback cache should be able to store several minutes of writes, assuming that it is required to be present when the system reboots.

If you are just using it as a glorified write through cache, there are no special requirements. But for most journalled filesystems and fsync heavy database applications a write through cache won't improve performance any more than adding a large amount of RAM (aside from the possibility of reducing cache warmup time on system restart).

To speed up any application that issues synchronous writes, including the filesystem journal itself, reliable writeback capability must be present. Typical journalled filesystems do lose several seconds of user level transactions on recovery. But when a filesystem wants to complete a meta data transaction it must issue a full barrier operation (generally meaning writing all dirty meta data buffers to disk) before it can continue or it cannot provide any recovery guarantees _at all_.

Similar thing with databases and other applications that use fsync. If fsync returns before the file data is committed to persistent storage, there is a substantial risk that the database will be completely corrupted. All the database redo logs and so forth are potentially worthless if the database cannot be confident that certain writes have actually been made persistent.

As in the filesystem case, even if you don't care about losing the last few seconds of user transactions, if you want to recover your database the database itself must be able to either commit writes and commit them now or have a block device that can provide full write barrier guarantees.

Ultimately this is a performance issue - if the flash cache provides the recoverable synchronous write guarantees, the latency for a database commit (or similar fsync requiring operation) can drop by a couple orders of magnitude.

Facebook releases Flashcache

Posted May 6, 2010 17:20 UTC (Thu) by Simetrical (subscriber, #53439) [Link]

Enterprise disks already have recoverable synchronous write guarantees, by means of battery-backed disk controllers. The advantage of using an SSD as a write-through buffers instead of battery-backed RAM is just that SSDs are much cheaper than RAM. Likewise, the advantage of using SSDs instead of RAM for read caching is that you get a much bigger read cache for the same amount of money. (Plus it's pre-populated on boot, but that's not a big deal for servers.)

So, no, it might not improve performance any more than adding the same amount of RAM, but the same amount of RAM costs a lot more. :) You don't need SSDs for caching if your dataset already fits in memory, but not all datasets fit in memory.

Facebook releases Flashcache

Posted Apr 30, 2010 23:49 UTC (Fri) by ESRI (subscriber, #52806) [Link]

Really would like to see this used to speed up O_SYNC type access (for NFS writes in sync mode).

Facebook releases Flashcache

Posted May 1, 2010 1:48 UTC (Sat) by koverstreet (✭ supporter ✭, #4296) [Link]

If you do want safe write behind caching, checksumming and journalling are on the list for bcache:
http://lkml.org/lkml/2010/4/30/496
http://lkml.org/lkml/2010/4/30/497
http://lkml.org/lkml/2010/4/30/498
http://lkml.org/lkml/2010/4/30/499

Not as far along as flashcache, but it's moving quickly.

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds