User: Password:
Subscribe / Log in / New account

The trouble with discard

The trouble with discard

Posted Aug 19, 2009 14:31 UTC (Wed) by ikm (subscriber, #493)
Parent article: The trouble with discard

> In the absence of discard-like functionality, an SSD will end up shuffling around data that the host system has long since stopped caring about; telling the device about unneeded blocks should result in better performance.

Actually, does anyone understand how a device could shuffle anything at all in the absence of discard information? Imagine we have an 8GB drive, pristine-clean. We write to block 0 several times - ok, it can remap it to physical block 0 on the first write, physical block 1 on the next write, and so on. This is all nice and all, but what happens once we have performed writes to all logical blocks? Now we have an 8GB fully filled with some opaque data. It's not possible to write to different physical blocks now, because they are all used up. So a sequence of writes to any one particular logical block could only be translated to a sequence of writes to another particular physical block, which would each time be the same -- it's not possible to shuffle anything around, because all the physical blocks are filled with the meaningful data already! So could someone please-please tell me just how is all that FTL "magical wear leveling" could work at all? From what I see, the discard-like information is *totally* required -- it's not possible to do any shuffling at all once you write some opaque data to all the logical blocks available, and the number of physical blocks is not significantly higher than the number of logical ones.

(Log in to post comments)

The trouble with discard

Posted Aug 19, 2009 15:44 UTC (Wed) by ken (subscriber, #625) [Link]

The disk needs several lists. one is the logical to physical block list(mapping list). one is the clean list and one is trash list.

Now the trash + clean list needs to be larger than the logical list otherwise it will never work.

If you had a drive with 10 pysical blocks all in the clean list the mapping list must be at least 1 block shorter but probably drives set aside a lot more than just one block.

Any way your first write would take one block from the clean list and put it in the mapping list so that block 0 now point to it. your next write would take a new block from the clean list and map block 0 to that instead and take the old block and put it on the trash list.

if you write another time to block 0 yet another one would be taken from the clean list the mapping would be changed and the previous block put on the trash list.

In the background the drive would erase any block on the trash list and put them back on the clean list.

Writes that happens when no clean blocks exist has to wait for the erase to finish.

Now reality is more complex as the filessytem blocks size and flash block size is not the same size and you need to sometimes move data around to even out the write count per block as there is a limited amount of erase that can be done to a block.

But the disk do not need to know what blocks are used for this to work it simply assumes that the entire disk is used. but if it could know it could put a lot more blocks on the clean list and not need to set aside as much blocks for perfomance reasons. It still needs extra blocks as flash blocks is unreliably and may fail after just 1 erase if you are unlucky.

The trouble with discard

Posted Aug 19, 2009 16:26 UTC (Wed) by ikm (subscriber, #493) [Link]

Yes. But that does not explain what would happen after a single write to each existing block was performed. The clean list would be exhausted and the trash list would be empty. Then what? Yes, there might be some 1-5% extra physical blocks reserved, but they still won't do good - basically, the wear leveling would only be done in that small zone, which would get exhausted really fast and it is not what wear leveling is supposed to be.

I am personally inclined to think that the whole FTL thing is a hoax. It might work in some cases (e.g. you have one file on your FAT filesystem and you constantly overwrite it), but in general I don't see how it could work. Once you fill all your flash with files completely, there doesn't seem to be a way to perform any real wear leveling any more. You can erase all the files -- it won't help, the underlying device would never knew you did, it's a logical operation which doesn't get propagated down to it. For it, all the space continues to be filled up. Not much it seems to be able to do then.

So when our editor writes that "an SSD will end up shuffling around data that the host system has long since stopped caring about", I don't understand just how an SSD would do that at all. Seems that everyone is just happy to believe that it can be done somehow, because the marketing suggests so.

The trouble with discard

Posted Aug 19, 2009 17:06 UTC (Wed) by foom (subscriber, #14868) [Link]

If the disk is 100% full, and you continuously overwrite a single block, it can decide to move a non- changing block into that location so as to stop its write-count from going up, and remap the changing-frequently block into the previously unchanging location.

This, of course, requires using extra write bandwidth, but it can allow the entire disk to be evenly written, even when 100% full. The changing block will get mapped into each location on the disk in turn.

The trouble with discard

Posted Aug 19, 2009 17:15 UTC (Wed) by ikm (subscriber, #493) [Link]

Thanks -- this answers it.

The trouble with discard

Posted Aug 19, 2009 17:11 UTC (Wed) by farnz (subscriber, #17727) [Link]

Take a flash device with 10 blocks. Expose 8 blocks of space; the remaining two blocks go on the clean list. To wear-level an apparently full device, pick the block of the 8 with fewest erase cycles, copy the data into one of your two clean blocks, mark the newly written block as being the logical equivalent of the old block, then put the old block on the trash list.

At the expense of using slightly more erase cycles than would otherwise be the case, I can ensure that all blocks have roughly the same number of erase cycles; this prevents you from wearing out a single block.

The trouble with discard

Posted Aug 19, 2009 17:24 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

First, some physical sectors are reserved ahead of time, so there are always some sectors on either the trash list or the clean list (or both). In the very cheapest and oldest models of flash drives active sectors are only remapped when written to, so you can end up with the situation you described, with wear-leveling restricted to the small area of the device not occupied by fixed data. These sorts of devices tend to wear out rather quickly under certain (common) use-cases.

However, in the better flash devices the FTL will remap even the unchanging sectors over time such that the wear-leveling is spread across all the device's erase sectors. These devices approach the ideal erase/rewrite limit (number of sectors * rewrites per sector).

For example, assume we have a flash device with five sectors: four visible plus one spare. After writing to the full device there are four active sectors (phys. 1-4) and one on the clean list (5). Rewriting logical sector 3 changes the active mapping to (1->1, 2->2, 3->5, 4->4), empties the clean list, and adds phys. sector 3 to the trash list. Rewriting sector three again with a simple FTL just causes phys. sector three to be erased and swapped with sector 5 on the trash list; none of the other sectors are wear-leveled. With a better FTL, however, the device may decide to place the new data in phys. sector 1, moving the original data for the first logical sector over to phys. sector 5 instead (active: 1->5, 2->2, 3->1, 4->4; trash: 3). This introduces an extra erase operation (on *some* writes), but now the changes are spread across three sectors rather than the original two, and additional writes would be further spread to sectors 2 and 4 as well. The end result is that all the sectors end up with similar numbers of rewrites.

The trouble with discard

Posted Aug 19, 2009 18:35 UTC (Wed) by ikm (subscriber, #493) [Link]

Yes. My original line of thought was that remapping a block to be written to a place of an already written and valid block requires more writes than just writing that block to a free location, which I thought was defeating the purpose. But I was overlooking the fact that while the scheme would require more writes indeed, it would nevertheless allow scattering wear evenly. Thanks for pointing that out; the other thing to point out is that lwn has got a great user base!

Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds