possibilities with raw flash access

Posted Aug 22, 2009 10:16 UTC (Sat) by dlang (guest, #313)
In reply to: The trouble with discard by markusw
Parent article: The trouble with discard

as you note, every flash drive available has a remapping layer hiding the details from the OS.

but let me point out some things that _could_ be done with either raw access to the flash, or some more smarts in the remapping layer

the key factor is that flash does not always need to be erased before it's changed

SLC flash can change any bit from a 1 to a 0 without needing to erase the block first

MLC flash can change any pair of bits from 11 to 00 (I think the order is 11 -> 10 -> 01 -> 00 but I'm not sure) without needing to erase the block first.

when a block is modified the hardware could compare the new data to the old data, and if the only difference os a 1->0 transition, it could modify the existing bock rather than writing a new version elsewhere

if the hardware supported this, then the OS could take advantage of the capability to reduce the number of erases necessary

the filesystem could:

leave the unused space at the end of a block is left as all '1's rather than all '0's so that additional data could be appended without needing to erase a block first.

change it's 'nothing more to point to' from a pointer containing all 0s to a pointer containing all 1s so that adding an additional block to a chain (or extent..) would not require re-writing the prior block as well

make a space/rewrite tradeoff in favor of reducing rewrites by allocating space for multiple copies of frequently changed metadata so that the entire block only needs to be re-written when all the extra slots have been used up.

as a trivial example of this last one. with atime enabled, every time a file is accessed it requires a rewrite of the entire eraseblock to record the new time.if you have a need to do a sync mount for data reliably, this could result in a rewrite for each file that's looked at

if however you had 10 atime slots, you would only do a rewrite after accessing a particular file 10 times, and if you a sync mount you would only have to do a rewrite after doing 10 passes through all the files (each file accessed would modify an atime slot,but until all 10 slots are full for any one file the block would not need to be moved, when the filesystem overflows the available slots on one file it can clean up all the other files in that block at the same time)

similar tricks could be done for size (either multiple slots or size+delta+delta approaches)

exactly what metadata should be given extra slots, and how many slots is an interesting problem to consider and experimant with (and probably is going to be different for different use cases as well)

if the hardware can tell the filesystem where the eraseblock boundries are then there are more optimizations that can take place

a couple side notes. the musical greeting cards and similar cheap recorder chips that became available in the 1990's actually worked by using eprom chips that had the similar programming properties as flash, when you erased them you get all 1s, but then by programming you could change a 1 to a 0.the recording capabilities showd up when someone realized that you didn't have to program them all the way to a 0, like flash they actually store an analog value and by rapidly sending programming pulses to the device (up to 100 per bit) you could adjust the flash voltage output to match the audio sample.then to play it back you just cycle through the addresses and amplify the analog voltage produced.

MLC flash takes advantage of a similar thing, it doesn't program the flash celll to a true 1 or 0, it can also program it to one of two additional analog values and then lables the original '1' as '11' the original '0' as '00' and then the two additional values as '10' and '01' the difficulty is that it's now harder to tell the different voltages apart.

I expect that MLC flash is going to climb in capacity rapidly as the manufacturers copy ideas from the history of modems

1. more values in a particular slot (what MLC does today vs SLC) as the ability to distinguish (and program) voltages that are close togeather get better (similar to how modems got faster as they distinguished more different tones as they went from 1200bps to 9600bps)

right now I believe that flash programming is mostly (if not entirely) a case of 'hit it with one programming pulse to change the cell', I expect that things will shift to 'hit it with a series of short programming pulses, checking between each pulse, until the cell gets to the desired voltage' doing this will increase complexity, and may slow down writes slightly (in some cases it may speed up writes as in the first instance the programming pulse needs to cover the 'worst case' needs, but with the new approach it can avoid 'overprogramming' the cell), but will result in more precise control of the cell voltage.

2. combining adjacent flash cells and define that only some of the range of possible bit patterns are legal, allowing the use of voltages in an individual cell that could be ambiguous, but become no longer ambiguous when combined with the data from the adjacent cell (similar to how modems shifted from pure tone detection to tone/phase detection with only some combinations being legal to allow for easier detection as they went above 9600bps)

possibilities with raw flash access

Posted Aug 22, 2009 20:13 UTC (Sat) by phip (guest, #1715) [Link] (1 responses)

> leave the unused space at the end of a block is left as all '1's
> rather than all '0's so that additional data could be appended
> without needing to erase a block first.

> change it's 'nothing more to point to' from a pointer containing
> all 0s to a pointer containing all 1s so that adding an additional
> block to a chain (or extent..) would not require re-writing the
> prior block as well

Would simply inverting the raw data from/to the flash device
before/after doing any block device or filesystem processing
be a useful optimization?

possibilities with raw flash access

Posted Aug 22, 2009 20:27 UTC (Sat) by dlang (guest, #313) [Link]

two things

it would only help if the device is smart enough to not allocate a new block on the flash (requiring a erase of a block eventually) if the data changes are only 1->0

inverting all the data will thrash your cpu cache, and most things don't benefit from the change, so I think it would be smarter to modify the filesystem

possibilities with raw flash access

Posted Aug 25, 2009 6:33 UTC (Tue) by markusw (guest, #53024) [Link] (1 responses)

If you have access to the raw NAND device (i.e. an MTD device, not a block device as seen from the Linux kernel) the very first thing to use properly is block discard. The 'partial block reuse' technique described might improve things as well, but compared to block discard it can only be a minor optimization. Also keep in mind that block erasure is a specified operation which just needs to be used by the above layers. On contrary, I'm not quite convinced that 'overriding' of blocks is properly supported on all devices.

However, despite the mentioned Fusion-IO devices (and I'm not even sure about those, as they have proprietary drivers) I don't know any device for commodity computers which allows raw NAND access. I've only seen that in the embedded world - and there we normally speak of just one single chip with some MiBs of storage capacity. So this discussion is theoretical anyway.

Despite wanting to have raw access to NAND devices, I'm also wondering about the latency implications of the involved SATA protocol. Cutting that and attaching the NAND more directly to PCI Express certainly can't hurt.

possibilities with raw flash access

Posted Aug 27, 2009 18:07 UTC (Thu) by robert_s (subscriber, #42402) [Link]

"I don't know any device for commodity computers which allows raw NAND access."

It's not pretty, but there used to be a USB XD card reader chip called alauda which would expose the XD/SM card ( they are both basically repackaged NAND chips ) as an MTD. Linux drivers are in the tree I think.

Useful for little more than experimentation. And only then if you're good at hunting for one on ebay.

possibilities with raw flash access

Posted Aug 25, 2009 11:48 UTC (Tue) by markusw (guest, #53024) [Link] (1 responses)

I'm just remembering another issue with this approach: you also need to take ECC into account. Flipping a data bit from '1' to '0' may require flipping an ECC bit back from '0' to '1'.

Thus, for such an approach to work, you'd also need to have control over the spare area, where ECC and bad block information is normally stored.

possibilities with raw flash access

Posted Aug 28, 2009 16:37 UTC (Fri) by dlang (guest, #313) [Link]

good point. if the flash device is doing ECC on the data being stored my suggestions are harder to implement if still possible.

it will depend on the level and type of the ECC, if an algorithm is in place on a per-byte level that lets the ECC bits be '1's when the data is all '1's then you could still do per-byte modifications