|
|
Log in / Subscribe / Register

Flash and small modifications

Flash and small modifications

Posted Apr 5, 2012 9:12 UTC (Thu) by dlang (guest, #313)
Parent article: 2012 Linux Storage, Filesystem, and Memory Management Summit - Day 2

At the Physical level, flash is very similar to EPROM where erasing it sets the bits to one value (say '0 for this discussion) and writing it flips individual bits (to 1 for this discussion). Except for checksums, there is no reason why you couldn't go back to a previously written block and flip additional bit. With Multi-level flash this is a bit more complicated as one cell in the flash represents multiple bits, and the value of the combined bits in the cell can be increased, but never decreased.

There are many cases where filesystem modifications could take advantage of this capability to modify blocks in place instead of copying the blocks to write the modified versions (which could significantly reduce the write amplification issue)

Two examples that jump to mine are:

sequential writes (i.e. log files), if a append-only file is written when it ends in the middle of a block, that block should be able to be modified rather than copied when additional data gets written

journal files, the journal wants to record that a transaction is pending, and then later indicate that the transaction has been completed (with durable writes at each stage), This is either a sequential write problem (if the journal has separate entries for transaction start and end), or it's a case of wanting to flip a bit in the existing entry to indicate that it has been completed. In either case, some ability to modify an existing block in specific ways would allow you to avoid having to copy the entire eraseblock because a small change took place.

Checksums (including RAID) make these sorts of changed impossible to do unless you can also modify the checksum, but this could be addressed by either checksumming smaller chunks (and leaving some chunks blank so that the chunk and it's checksum can be written later), or by allowing there to be several possible checksums for a chunk and you use the most recent one.

In any case, organizing flash with checksums to work this way will be less efficient use of of the flash cells than the current mode, but it would be similar to the idea being floated of there being different classes of storage ('high durability' vs 'low durability'), except this doesn't actually require different types of flash, it could be implemented via different allocation of the bits of the existing flash.

For example, instead of storing 2M of user data per flash page, it could store 1.5M of user data, but have space for many different versions of the checksums so that the user data could be modified many times without having to copy the pages.

The filesystem would need to be tweaked to make use of this (except possibly in the append-only case), because it would need to modify the bits the 'right' way in the 'right' alignment for this to work, and given the space efficiency vs modification efficiency, it would probably be wise to have the filesystem tell the block device that it wanted to have data stored in the modification efficient mode.


to post comments

Flash and small modifications

Posted Apr 6, 2012 10:52 UTC (Fri) by valyala (guest, #41196) [Link] (2 responses)

> sequential writes (i.e. log files), if a append-only file is written when it ends in the middle of a block, that block should be able to be modified rather than copied when additional data gets written

I believe sequential writes are already handled optimally by flash firmware. All writes to flash can be easily buffered in on-board RAM and merged into block-sized writes before hitting the flash. Of course, such RAM must be backed by a power supply (a capacitor), which will allow flushing RAM contents to flash in the event of power loss.

Flash and small modifications

Posted Apr 6, 2012 18:07 UTC (Fri) by dlang (guest, #313) [Link] (1 responses)

if they are buffered by on-board ram, then they are going to be lost if the device looses power. As such they can not be suitable for things like journal writes.

unless the device includes an on-board battery with enough power to write the contents of the ram to flash when it losses power, and I am not aware of any SSD drives that do this.

Flash and small modifications

Posted Apr 6, 2012 18:09 UTC (Fri) by dlang (guest, #313) [Link]

Also, any buffering by on-board ram will only help if the additional write happens in a short enough time that the data is still in ram.

Think of a log file that gets a new line written to it every minute or two. If the average log line is ~1/4K (which is what I measured my logs to be), then you have about 2K log entries per block, and so will re-write the block 2K times before filling it up.

Flash and small modifications

Posted Apr 6, 2012 21:08 UTC (Fri) by jzbiciak (guest, #5246) [Link] (3 responses)

Now, bear in mind my direct experience is mainly with writing low-level flash code on microcontrollers with on-chip flash. But, I can't imagine it's terribly different than the much higher density mass storage flash.

I'll use "set" and "erase" to describe the operations on bits below. In typical flash, "erase" clears the bit to 1, and "set" deposits a charge and causes it to read as 0. Or, you can flip those around with "erase" leaving a 0 and "set" leaving a 1. Either way, the principle holds, and is the same principle you were highlighting.

The flash I've worked with puts an upper bound on the number of times you can set the same bit without an intervening erase cycle. This makes a certain amount of sense to me. The process of flipping a bit from erased to set deposits electrons onto an otherwise-floating transistor gate. Setting the same bit more than once could threaten to break down that gate prematurely by depositing too much charge. At least, I guess that's the reason behind the admonition to not re-program a given word more than once.

That said, write blocks tend to be much smaller than erase blocks, so that does give you some flexibility. So, while a given bit can only be "set" a certain number of times, the hardware generally gives you some granularity to avoid setting it too often.

Anyway, that constraint can limit the number of times you can "modify in place" a given block, and limit the ways in which that modification could be carried out. To me, the most interesting scenario is the "append to log" scenario. This one seems most compatible with the underlying physics. It does require the filesystem to pad sectors beyond EOF with an appropriate value for rewriting later. For example, if flash needs 1s, then the portion of a sector beyond EOF written by the filesystem needs to be 0xFF.

As far as checksums go, if you go with CRCs, they have a really nice property: They're fully linear codes. The CRC for a given block of data is equal to the XOR of the CRCs for each of the 1 bits in that block if you were to assume the other bits were zero. (Assuming no pre/post inversions.) That is, suppose you wanted to CRC this string: 10010001. Then: CRC(10010001) = CRC(10000000) ^ CRC(00010000) ^ CRC(00000001).

So, you could protect a block of data that's updated in this way by just appending the CRC of the delta to the "CRC list". The reader would then need to XOR all of the provided CRCs. Now, if your flash erases to 1 and sets to 0, then you could store CRCs inverted, with the unused CRCs reading as 0xFFFFFFFF. The reader wouldn't even have to know how many CRCs are valid at that point -- it could just read them all in and XOR them. If there's an even number of potential CRCs (as seems likely), the reader wouldn't even need to apply any inversions.

Flash and small modifications

Posted Apr 6, 2012 22:08 UTC (Fri) by dlang (guest, #313) [Link] (2 responses)

I was thinking of taking the easy way out and just saying "use the most recent checksum" rather than doing any fancy chaining of the checksums togeather.

But overall I think that we are talking basically the same thing.

I first ran into this sort of concept many years ago when the digital recording first hit the consumer market (if you can remember when the first music Christmas cards and similar appeared, that timeframe). The way those recorders worked was that they used a EEPROM chip and recorded an analog value (~8 bits worth) in each bit of the EEPROM by hitting it with a smaller than spec programming charge repeatedly until the stored value matched the desired analog value. When doing this program you would 'program' the chip with 00000001 until the first bit got to the desired value, then program it with 00000010 until the second bit got to the desired value, etc. you would never be sending a programming signal to a bit that you already had set.

multi-level flash uses a similar trick to store 4 levels of signal per cell, it calls these '00' '01' '10' '11' in the digital world, so the 'partial modification' that I am talking about would require support from the flash chipset to allow it to not try to reprogram a cell that's already been set, but that is a pretty trivial thing to do.

Flash and small modifications

Posted Apr 6, 2012 22:46 UTC (Fri) by jzbiciak (guest, #5246) [Link] (1 responses)

Where I think the overlaid CRCs might become more interesting is when the CRC is supposed to cover a fairly large block (say 64K) but you're rewriting a much smaller piece (say 4K) with a minor update. If you don't have the other pieces handy, you can compute your CRC update on just the piece you have, rather than having to go read them.

Flash and small modifications

Posted Apr 6, 2012 23:02 UTC (Fri) by dlang (guest, #313) [Link]

true, but if you think of the fact that it's only the on-device controller that would have to re-read the data, I don't think it really matters, the speed that it can re-read the raw data is fast enough that I really don't expect it to be a bottleneck.

Flash and small modifications

Posted Apr 12, 2012 16:31 UTC (Thu) by wookey (guest, #5501) [Link] (1 responses)

7 or so years ago this was true: you could rewrite data and take advantage of the fact that you could keep setting more bits. YAFFS1 used this as the mechanism for marking blocks superceded. BUt as soon as we got past 512 byte-page flash to 2K page flash re-writing was deprecated by the manufacturers. Even just one rewrite was not encouraged, and the direction of travel was clear so YAFFS2 changed the mechanism for block marking and never rewrote anything. By the time we got to MLC flash there was no rewriting permitted.

So whilst your theory is fine, in practice this hasn't been any use for years.

If I'd been in the audience I'd have been in the 'give us raw access' group. Or at least very direct control over what 'optimisations' the device will do itself. My experience is that linux filesystem writers can do a much better job of getting this right than flash vendors, who really don't have the same optimisation parameters as us at all. Mostly their efforts to do things for us have produced a lot of shitty (slow, unreliable) flash in SD cards.

On the other hand it is clear that some things are better done on the device (checksumming/ECC for a start), and potentially some other stuff (the way modern disk drives by-and-large do a reasonable job internally without messing things up).

But if they just told us the block sizes that would be a good start. The 5 years this hasn't been happening for has been a terrible waste. I bet we have legal dept and patents to thank for that as well as generally not caring about anthing other than 'FAT in cameras'.

Flash and small modifications

Posted Apr 12, 2012 19:31 UTC (Thu) by dlang (guest, #313) [Link]

I'm expecting that this would require explicit support by the manufacturers. At the very least from the device manufacturers if not by the chip manufacturers.

The question is if this would help enough to matter.

There are two areas I see it potentially helping:

1. longer lifetime due to reduced wear by not having to copy the blocks as much

2. better performance by being able to update the blocks and therefor avoid the need to do as many slow erase cycles (the device tries to do these in the background, but if there are enough writes on a full enough device, it has to wait for them)


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds