Flash and small modifications
Flash and small modifications
Posted Apr 5, 2012 9:12 UTC (Thu) by dlang (guest, #313)Parent article: 2012 Linux Storage, Filesystem, and Memory Management Summit - Day 2
There are many cases where filesystem modifications could take advantage of this capability to modify blocks in place instead of copying the blocks to write the modified versions (which could significantly reduce the write amplification issue)
Two examples that jump to mine are:
sequential writes (i.e. log files), if a append-only file is written when it ends in the middle of a block, that block should be able to be modified rather than copied when additional data gets written
journal files, the journal wants to record that a transaction is pending, and then later indicate that the transaction has been completed (with durable writes at each stage), This is either a sequential write problem (if the journal has separate entries for transaction start and end), or it's a case of wanting to flip a bit in the existing entry to indicate that it has been completed. In either case, some ability to modify an existing block in specific ways would allow you to avoid having to copy the entire eraseblock because a small change took place.
Checksums (including RAID) make these sorts of changed impossible to do unless you can also modify the checksum, but this could be addressed by either checksumming smaller chunks (and leaving some chunks blank so that the chunk and it's checksum can be written later), or by allowing there to be several possible checksums for a chunk and you use the most recent one.
In any case, organizing flash with checksums to work this way will be less efficient use of of the flash cells than the current mode, but it would be similar to the idea being floated of there being different classes of storage ('high durability' vs 'low durability'), except this doesn't actually require different types of flash, it could be implemented via different allocation of the bits of the existing flash.
For example, instead of storing 2M of user data per flash page, it could store 1.5M of user data, but have space for many different versions of the checksums so that the user data could be modified many times without having to copy the pages.
The filesystem would need to be tweaked to make use of this (except possibly in the append-only case), because it would need to modify the bits the 'right' way in the 'right' alignment for this to work, and given the space efficiency vs modification efficiency, it would probably be wise to have the filesystem tell the block device that it wanted to have data stored in the modification efficient mode.
