May 13, 2009
This article was contributed by Koen Vervloesem
As the maintainer for the ext4 file system, Ted Ts'o was the perfect
speaker to open the recent NLUUG Spring Conference with the theme "File
systems and storage". In his keynote at the conference in the
Netherlands, he placed into context some developments and changes in file
system and storage technologies.
His central question was: why has there been a flowering of new file
systems showing up in Linux in the last 18 months? New file systems that
have recently become available in the mainline kernel include ext4, btrfs,
and UBIFS. The next Linux
kernel release, 2.6.30, adds three new file systems: Nilfs, Pohmelfs, and exofs (formerly
known as osdfs). Ts'o said that "it's now a fairly exciting time for
file systems" and he added that this is partly thanks to Sun:
"Sun woke up the field with their file system ZFS and they should
deserve credit for it. Before the appearance of ZFS, the development of
file systems virtually stood still for decades." At the moment, the
Linux kernel tree lists 65 file systems, although most of them are
optimized for a specific task and are not much used. Ts'o sees this as an
opportunity for developers to experiment and innovate.
Of course the development of all these file systems doesn't come out of
the blue. They are driven by some new developments in storage technology,
such as the advent of solid state drives (SSDs), data integrity fields, and
4K sectors. SSDs have especially changed a lot in the storage stack:
"The shift from relatively slow hard disks to fast SSDs means that
many assumptions in the storage stack don't hold anymore." Even though
Ts'o expects SSDs not to replace HDs completely, he sees the shift as an
interesting opportunity: "This spurs a lot of development, as people
are finally talking about changing storage interfaces."
One change that is happening now is the shift from 512-byte physical sectors
to 4K in hard drives. The abstraction of 512-byte sector sizes
has been here for decades, and it's not easy to change, as the
transition affects a lot of subsystems that don't accept a 4K sector size
currently. For example, the partitioning system and the bootloader require
changes because they both rely on the fact that partitions start from the
63rd sector of the drive, which is misaligned with the 4K sector
boundary. A proposed solution is to align 512-byte logical sectors in a way
that the first logical sector starts from the second octant (512 bytes) of
the physical first 4K sector. However, Microsoft Windows spoils the party
because it starts the partition table at a 1M boundary, which is
incompatible with this "odd-aligned scheme". According to Ts'o, this is one
of the reasons why storage vendors like to talk to open source projects:
they want to move forward instead of holding on to legacy solutions. It
remains to be seen whether Windows will join the party.
Another change that Ts'o deems important is object-based storage. Instead of
presenting the abstraction of an array of blocks, addressed by their index
in the array (as traditional storage systems do), an object store presents
the abstraction of a collection of objects, addressed by a unique id. If
the operating system uses object-based storage, it stores an object with an
id, without having to know overly low-level details such as the sector or
cylinder of the block on the hard drive. When the operating system wants to
read the object later, it only has to know the object's id. Ts'o sees many
advantages in this approach: "With object-based storage, the
operating system can push more intelligence into the hard disk, which is
better placed anyway to make intelligent decisions and improve
performance."
Ts'o also notes that abstractions such as disks, RAID, logical volume
management, and file systems are more and more blending into each
other. "Maybe those different interfaces don't make sense anymore?
ZFS figured this out very well by building all those interfaces under the
umbrella of the file system, and btrfs will do something similar."
But he warns that this doesn't mean that people should settle with ZFS or
btrfs: "I hope that developers will keep exploring abstractions to
find the right interfaces." Ts'o also expressed his hope that the
license incompatibility between ZFS (CDDL) and Linux (GPL) would get
fixed.
As a typical example of the proliferation of specialized file systems,
Jörn Engel talked at the NLUUG conference about LogFS, his scalable file system
for flash devices. Because most current file systems are designed for use on
rotating drives, and because flash-based storage has some quirks, Engel decided
to design a file system explicitly for flash. He started with a fast
filesystem (FFS) style
design and adjusted a lot of the algorithms to work better with flash. For
example, for copy-on-write, FFS rewrites blocks in place after the
copy. Because flash storage cannot be simply overwritten, a flash block
must be erased and rewritten in two separate steps, a requirement which can
cause serious performance problems. Engel's solution was to
use a log-structured design instead. Another issue was that the journal is written
often to the storage. Because there are limits to the number of
times a block of flash memory can be erased and rewritten reliably, Engel's
solution is to move the journal from time to time.
Engel said that LogFS is almost ready for use. He is still chasing one
hard-to-replicate bug, but, after that, he plans to submit the code for
inclusion in the Linux kernel tree. LogFS should be better than JFFS2 on
larger devices, because JFFS2 stores no filesystem directory tree on the
device. This means that JFFS2 has to perform a time- and memory-consuming
scan when it mounts the file system, building the directory tree at
that time. Putting the tree on the device, as LogFS does, reduces mount
time and memory requirements.
At the NLUUG Spring Conference a lot of recent developments were talked
about, not only regarding file systems, as Ts'o showed, but also higher in
the storage stack. Michael Adam for example stressed that Samba, which
started as a free re-implementation of Microsoft's SMB/CIFS networking
protocol, allows for setting up a clustered CIFS server, a feature that
current Microsoft servers do not offer.
The NLUUG Spring Conference was an interesting event thanks to the breadth
of the topics presented. On the one hand there were introductory talks
about the possibilities of ZFS, the virtual filesystem libferris and
practical experiences with WebDAV. On the other hand, visitors could get
some first-hand and highly specific information about the future direction
of projects like DRBD, device-mapper and LogFS. This way, the conference
had something for everyone: it gave a broad overview of the current state
of the art in file systems and storage, while providing enough technical
details for those interested in it. At least your author came home with a
better understanding of file systems and storage in the Linux ecosystem.
(
Log in to post comments)