By Jonathan Corbet
April 2, 2008
The steady growth in flash-based memory devices looks set to transform
parts of the storage industry. Flash has a number of advantages over
rotating magnetic storage: it is smaller, has no moving parts, requires
less power, makes less noise, is truly random access, and it has the
potential to be faster.
But flash is not without its own idiosyncrasies. Flash-based devices
operate on much larger blocks of data: 32KB or more. Rewriting a portion
of a block requires running an erase cycle on the entire block (which can
be quite slow) and writing the entire block's contents. There is a limit
to the number of times a block can be erased before it begins to corrupt
the data stored there; that limit is low enough that it can bring a
premature end to a flash-based device's life, especially if the same block
is repeatedly rewritten. And so on.
A number of approaches exist for making flash-based devices work well.
Many devices, such as USB drives, include a software "flash translation
layer" (FTL); this layer performs the necessary impedance matching to make
a flash device look like an ordinary block device with small sectors.
Internally, the FTL maintains a mapping between logical blocks and physical
erase blocks which allows it to perform wear leveling - distributing
rewrite operations across the device so that no specific erase block wears
out before its time - though some observers question whether low-end flash
devices bother to do that. The use of FTL layers makes life easy for the rest of
the system, but it is not necessarily the way to get the best performance
out of the hardware.
If you can get to the device directly, without an FTL getting in the way,
it is possible to create filesystems which embody an awareness of how flash
works. Most of our contemporary filesystems are designed around rotating
storage, with the result that they work hard to minimize time-consuming
operations like head seeks. A flash-based filesystem need not worry about
such issues, but it must be concerned about things like erase blocks
instead. So making the best use of flash requires a filesystem written
with flash in mind.
The main filesystem for flash-based devices on Linux is the venerable
JFFS2. This filesystem works, but it was designed for devices which are
rather smaller than those available today. Since JFFS2 must do things like
rebuild the entire directory tree at mount time, it can be quite slow on
large devices - for relatively small values of "large" by 2008 standards.
JFFS2 is widely seen as reaching the end of its time.
A more contemporary alternative is LogFS, which has been discussed on these pages in the
past. This work remains unfinished, though, and development has been
relatively slow in recent times; LogFS has not yet been seriously
considered for merging into the mainline. A more recent contender is UBIFS; this code is in a state
of relative completion and its developers are asking for serious review.
UBIFS depends on the UBI layer, which was merged for 2.6.22. UBI
("unsorted block images") is not, technically, an FTL, but it performs a
number of the same functions. At the heart of UBI is a translation table
which maps logical erase blocks (LEBs) onto physical erase blocks (PEBs).
So software using UBI to access flash sees a device providing a simple set
of sequential blocks which apparently do not move. In fact, when an LEB is
rewritten, the new data will be placed into a different location on the
physical device, but the upper layers know nothing about it. So UBI makes
problems like wear leveling and bad block avoidance go away for the upper
layers. UBI also takes care of running time-consuming erase operations in
the background when possible so that upper layers need not wait when
writing a block.
One little problem with UBI is that the logical-to-physical mapping
information is stored in the header of each erase block. So when the UBI
layer initializes a flash device, it must read the header from every block
to build the mapping table in memory; this operation clearly takes time.
For 1GB flash devices, this initialization overhead is tolerable; in the
future, when we'll be booting our laptops with terabyte-sized flash drives
in them, the linear scan will be a problem. The UBIFS developers are aware
of this issue, but believe that it can be solved at the UBI level without
affecting the higher-level filesystem code.
By using UBI, the UBIFS developers are able to stop worrying about some
aspects of flash-based filesystem design. Other problems remain, though.
For example, the large erase blocks provided by flash devices require
filesystems to track data at the sub-block level and to perform occasional
garbage collection: coalescing useful information into new blocks so that
the remaining "dead" space can be reclaimed. Garbage collection, along
with the
potential for blocks to turn bad, makes space management on flash
devices tricky: freeing space may require using more space first, and there
is no way to know how much space will actually become available until the
work has been done.
In the case of UBIFS, space management is an even trickier problem for a
couple of reasons. One is that, like a number of other flash filesystems,
UBIFS performs transparent compression of the data. The other is that,
unlike JFFS2, UBIFS provides full writeback support, allowing data to be
cached in memory for some time before being written to the physical media.
Writeback gives large performance improvements and reduces wear on the
device, but it can lead to big trouble if the filesystem commits to writing
back more data than it actually has the space to store. To deal with this
problem, UBIFS includes a complex "budgeting" layer which manages
outstanding writes with pessimistic assumptions on what will be possible.
Like LogFS, UBIFS uses a "wandering tree" structure to percolate changes up
through the filesystem in an atomic manner. UBIFS also uses a journal,
though, to minimize the number of rewrites to the upper-level nodes in the
tree.
The latest UBIFS posting raised questions about how it compares with
LogFS. The resulting discussion was ... not entirely technical, but a few
clear points came out. UBIFS is in a more complete state and appears to
perform quite a bit better at this time. LogFS is a lot less code, avoids
the boot-time linear scan of the device, and is able to work (with some
flash awareness) through an FTL. Which is better is not a question your
editor is prepared to answer at this time; what does seem clear is that the
growing competition between the two projects has the potential to inspire
big improvements on both sides in the near future.
(
Log in to post comments)