Log-structured file systems: There's one in every SSD

Posted Sep 20, 2009 8:17 UTC (Sun) by jzbiciak (guest, #5246)
Parent article: Log-structured file systems: There's one in every SSD

Nitpicking on an otherwise excellent article:

This requires clearing an erase block, moving all the in-use blocks to another area, and keeping a mapping between the logical location of blocks and their physical locations - exactly what a log-structured file system does.

I really do hope that the in-use blocks get moved before the erase block gets cleared. ;-) The only other detail I'd nitpick about is that an erase block can generally get filled incrementally. At least, that's been my experience with embedded flash.

Where that last detail's important: One doesn't need to gather many kilobytes of writes a priori before scheduling an erase block for erasure. Rather, once an erase block is erased, all of the next several kilobytes of writes should go to the block. You're committing the direction of future writes.

I took particular interest in this article since I'm designing a special purpose "file system" for a flash-based microcontroller. The flash in the microcontroller can't tolerate very many rewrite cycles (100 worst case, 1000 nominal), so I'm relying heavily on wear leveling, error correction and having way more physical cells than data stored to make my application work. (I'm storing around 1.5K bytes of dynamic data in about 100K of space, which helps. And yes, my approach looks like a giant log, and if the most recent save ends up corrupt when I eventually read it, I plan on rolling back to the previous, since its right there in the log.)

And on a different note: I personally think that the absurd performance of TRIM (10s or 100s of milliseconds? Really?) on some devices is just criminal. Better to just not support TRIM if it's going to perform so badly. I can't wait until manufacturers get it right. I really want to go SSD on my laptop, but only once the technology gets sane.

As far as how TRIM should be treated: I agree filesystems should treat it like alloc()/free(). At the same time, we should also allow for some free() calls that didn't get committed in the case of system failure. A trim request doesn't have the same correctness requirements as, say, a journal update. So, in some sense, it makes sense to allow them to be lossy near a crash boundary.

I'm up a little too late to think this through completely, but I think there are cases where you can't be 100% sure that you've executed all of your TRIMs or recorded all of your intents to TRIM before you've committed all the necessary actions to free a write block. If I'm correct (and I'd lay even odds that I'm not, at this point), that means that you'll need some mechanism to scan and TRIM in the background on an unclean unmount, even with an otherwise airtight filesystem.

Log-structured file systems: There's one in every SSD

Posted Sep 22, 2009 5:22 UTC (Tue) by butlerm (subscriber, #13312) [Link] (1 responses)

Apparently Micron's flash chips have the ability to internally move data
around without having it leave the chip. No doubt very useful in this
application.

Log-structured file systems: There's one in every SSD

Posted Sep 23, 2009 0:48 UTC (Wed) by dwmw2 (subscriber, #2063) [Link]

Not really.

The 'read and then reprogram elsewhere from internal buffer' facility is all very well in theory, but your ECC is off-chip. So if you want to be able to detect and correct ECC errors as you're moving the data, rather than allowing them to propagate, then you need to do a proper read and write instead.

Linux has never bothered to use the 'copy page' operation on NAND chips which support it.