Jörn Engel gave a talk on the death of hard disks. His core point is
that flash-based storage is faster, requires less power, makes less noise,
and is more robust than rotating storage. It is also more expensive, for
now, but flash is getting cheaper much more quickly. Jörn projects
that flash-based drives will become more economical than hard drives
between 2012 and 2019, depending on which drives one looks at.
Flash makes life easier in a number of ways; the lack of seek delays, for
example, means that much of the trouble the kernel goes to in scheduling of
block I/O operations can be eliminated. On the other hand, flash has
challenges of its own: it is not quite the random-access array of blocks
that one would like. In particular, writing to flash requires dealing with
wear-leveling issues, erase operations, and more.
Manufacturers have done their best to paper over these issues through the
use of translation layers which make a flash array look like a simple disk
drive. These layers make it easier to use flash with existing software,
but there are problems: performance is not always what one would like, and
there can be hidden caches which delay the persistent storage of data. So
Jörn has a request to the flash manufacturers: give us direct access
to the flash array, without translation layers, and let us figure out how
to best support it.
Chris Mason is not waiting for flash to take over; instead, he is working
on the next-generation Linux filesystem for rotating disks. The result, Btrfs, was the subject of
Chris's talk at LCE. LWN covered
Btrfs last June.
Chris's motivation is the fact that disks are, for all practical purposes,
getting slower - the time required to read an entire disk is growing. Most
systems still store large numbers of small files, leading to a lot of
wasted space. Btrfs tries to address these issues and provide a number of
interesting features as well. It is extent-based, resulting in more
efficient storage of larger files. Small files are packed into the
filesystem tree itself, eliminating the internal fragmentation experienced
by a number of other filesystems. It has indexed directories, data and
metadata checksums, efficient snapshots, sequence numbers in objects
(facilitating quick and easy incremental backups), an online filesystem
checker in the works, and more.
The directories are actually indexed twice. One index is there for fast
filename lookup; the other one, instead, lets the readdir() system
call return files in inode-number order, speeding filesystem traversals.
Extended attributes are stored as directory entries. Every file has a
backpointer to its containing directory - and, yes, multiply-linked files
have backpointers to all of the directories in which they are found.
Perhaps the most fun part of the talk was the plots Chris has generated
from various benchmark runs. The limiting factor on filesystem performance
is generally disk seeks; it is important to minimize disk head movement.
In general, ext3 tends to move the disk head all over the platter during
benchmark runs while Btrfs and XFS do better. Chris noted that better
writeback clustering in the virtual memory subsystem would help ext3.
More benchmark plots (some animated) can be found in the Btrfs
benchmark and Seekwatcher pages.
Toward the end, Chris was asked whether performance slows down when the
disk gets full. The answer was "no" because the system crashes instead.
That's a good reminder that Btrfs remains an early-stage development; the
on-disk format has not even been finalized yet. But the production version
of Btrfs is certainly something to look forward to.
Back in 2000, the British Computer Society awarded its Lovelace Medal to
Linus Torvalds. In 2007, the society finally caught up with him to deliver
the medal - though, as speaker Dr. David Hartley noted, they probably were
almost as quick as the post office would have been. As is typically the
case, Linus seemed somewhat embarrassed by the attention.
LinuxConf Europe intends to be a conference on a truly European scale. To
that end, next year's event will likely move to Germany; the details were
not yet finalized to the point that the location could be announced at this
year's conference, though. LCE, helped by the kernel summit, has gotten
this institution off to a good start; your editor is looking forward to
next year's edition.
(