LWN.net Logo

NLUUG: The bright future of Linux filesystems

May 13, 2009

This article was contributed by Koen Vervloesem

As the maintainer for the ext4 file system, Ted Ts'o was the perfect speaker to open the recent NLUUG Spring Conference with the theme "File systems and storage". In his keynote at the conference in the Netherlands, he placed into context some developments and changes in file system and storage technologies.

His central question was: why has there been a flowering of new file systems showing up in Linux in the last 18 months? New file systems that have recently become available in the mainline kernel include ext4, btrfs, and UBIFS. The next Linux kernel release, 2.6.30, adds three new file systems: Nilfs, Pohmelfs, and exofs (formerly known as osdfs). Ts'o said that "it's now a fairly exciting time for file systems" and he added that this is partly thanks to Sun: "Sun woke up the field with their file system ZFS and they should deserve credit for it. Before the appearance of ZFS, the development of file systems virtually stood still for decades." At the moment, the Linux kernel tree lists 65 file systems, although most of them are optimized for a specific task and are not much used. Ts'o sees this as an opportunity for developers to experiment and innovate.

Of course the development of all these file systems doesn't come out of the blue. They are driven by some new developments in storage technology, such as the advent of solid state drives (SSDs), data integrity fields, and 4K sectors. SSDs have especially changed a lot in the storage stack: "The shift from relatively slow hard disks to fast SSDs means that many assumptions in the storage stack don't hold anymore." Even though Ts'o expects SSDs not to replace HDs completely, he sees the shift as an interesting opportunity: "This spurs a lot of development, as people are finally talking about changing storage interfaces."

One change that is happening now is the shift from 512-byte physical sectors to 4K in hard drives. The abstraction of 512-byte sector sizes has been here for decades, and it's not easy to change, as the transition affects a lot of subsystems that don't accept a 4K sector size currently. For example, the partitioning system and the bootloader require changes because they both rely on the fact that partitions start from the 63rd sector of the drive, which is misaligned with the 4K sector boundary. A proposed solution is to align 512-byte logical sectors in a way that the first logical sector starts from the second octant (512 bytes) of the physical first 4K sector. However, Microsoft Windows spoils the party because it starts the partition table at a 1M boundary, which is incompatible with this "odd-aligned scheme". According to Ts'o, this is one of the reasons why storage vendors like to talk to open source projects: they want to move forward instead of holding on to legacy solutions. It remains to be seen whether Windows will join the party.

Another change that Ts'o deems important is object-based storage. Instead of presenting the abstraction of an array of blocks, addressed by their index in the array (as traditional storage systems do), an object store presents the abstraction of a collection of objects, addressed by a unique id. If the operating system uses object-based storage, it stores an object with an id, without having to know overly low-level details such as the sector or cylinder of the block on the hard drive. When the operating system wants to read the object later, it only has to know the object's id. Ts'o sees many advantages in this approach: "With object-based storage, the operating system can push more intelligence into the hard disk, which is better placed anyway to make intelligent decisions and improve performance."

Ts'o also notes that abstractions such as disks, RAID, logical volume management, and file systems are more and more blending into each other. "Maybe those different interfaces don't make sense anymore? ZFS figured this out very well by building all those interfaces under the umbrella of the file system, and btrfs will do something similar." But he warns that this doesn't mean that people should settle with ZFS or btrfs: "I hope that developers will keep exploring abstractions to find the right interfaces." Ts'o also expressed his hope that the license incompatibility between ZFS (CDDL) and Linux (GPL) would get fixed.

As a typical example of the proliferation of specialized file systems, Jörn Engel talked at the NLUUG conference about LogFS, his scalable file system for flash devices. Because most current file systems are designed for use on rotating drives, and because flash-based storage has some quirks, Engel decided to design a file system explicitly for flash. He started with a fast filesystem (FFS) style design and adjusted a lot of the algorithms to work better with flash. For example, for copy-on-write, FFS rewrites blocks in place after the copy. Because flash storage cannot be simply overwritten, a flash block must be erased and rewritten in two separate steps, a requirement which can cause serious performance problems. Engel's solution was to use a log-structured design instead. Another issue was that the journal is written often to the storage. Because there are limits to the number of times a block of flash memory can be erased and rewritten reliably, Engel's solution is to move the journal from time to time.

Engel said that LogFS is almost ready for use. He is still chasing one hard-to-replicate bug, but, after that, he plans to submit the code for inclusion in the Linux kernel tree. LogFS should be better than JFFS2 on larger devices, because JFFS2 stores no filesystem directory tree on the device. This means that JFFS2 has to perform a time- and memory-consuming scan when it mounts the file system, building the directory tree at that time. Putting the tree on the device, as LogFS does, reduces mount time and memory requirements.

At the NLUUG Spring Conference a lot of recent developments were talked about, not only regarding file systems, as Ts'o showed, but also higher in the storage stack. Michael Adam for example stressed that Samba, which started as a free re-implementation of Microsoft's SMB/CIFS networking protocol, allows for setting up a clustered CIFS server, a feature that current Microsoft servers do not offer.

The NLUUG Spring Conference was an interesting event thanks to the breadth of the topics presented. On the one hand there were introductory talks about the possibilities of ZFS, the virtual filesystem libferris and practical experiences with WebDAV. On the other hand, visitors could get some first-hand and highly specific information about the future direction of projects like DRBD, device-mapper and LogFS. This way, the conference had something for everyone: it gave a broad overview of the current state of the art in file systems and storage, while providing enough technical details for those interested in it. At least your author came home with a better understanding of file systems and storage in the Linux ecosystem.


(Log in to post comments)

NLUUG: The bright future of Linux filesystems

Posted May 14, 2009 4:08 UTC (Thu) by tnoo (subscriber, #20427) [Link]

> Before the appearance of ZFS, the development of file systems virtually
> stood still for decades.

What a bold statement. Reiser3 (and Reiser4) immediately come to mind, but
there might be more.

NLUUG: The bright future of Linux filesystems

Posted May 14, 2009 6:08 UTC (Thu) by butlerm (subscriber, #13312) [Link]

I agree that the "stood still for decades" seems to be a bit of an
exagerration. During the 1990s journalling filesystems became commonplace,
as did btree structured directories. Netapp (aside from its proprietary
nature) has probably done more to date to advance the state of the art in
filesystems than anyone else has, due to perhaps a fifteen year head start
over ZFS. ZFS is so fundamentally different from WAFL that I don't know
how anyone could confuse them though.

Other than ZFS and perhaps Reiserfs, there does appear to have been a slow
period in filesystem development for about the past decade. It is great to
see that things have picked up.

What I am hoping to hear is that the BTRFS or possibly ZFS folks have
figured out how to support a write-in-place mode, so that the storage of
filesystems within filesystems as well as certain large databases does not
degrade excessively due to fragmentation issues.

NLUUG: The bright future of Linux filesystems

Posted May 14, 2009 8:06 UTC (Thu) by sourcejedi (guest, #45153) [Link]

I believe btrfs has a "no COW" option. I'm not sure how much functionality you lose though - it's unlikely to support snapshots, and I have a feeling that it may disable checksumming as well.

NLUUG: The bright future of Linux filesystems

Posted May 14, 2009 15:59 UTC (Thu) by masoncl (subscriber, #47138) [Link]

The Btrfs nodatacow mode does disable checksumming, but it does not disable snapshots.

When you snapshot or create a clone of a nodatacow file, COW is enforced for the first write of each block after the snapshot, and then things go back to the regular nodatacow mode.

NLUUG: The bright future of Linux filesystems

Posted May 14, 2009 10:20 UTC (Thu) by nix (subscriber, #2304) [Link]

nilfs and pohmelfs look seriously neat. I have two concerns though:

- nilfs is log-structured, which tends to be nearly pessimal for performance on rotating storage, as you rarely access files in exactly the same order as they were written. This problem is severe enough that it has torpedoed most log-structured filesystems in the past. How does nilfs avoid it?

- pohmelfs looked seriously neat as an NFS replacement (the performance figures are very impressive), but recent developments appear to be pushing it towards serving data from a memcached-like key->value store. This is very far from the NFS model ('a server with lots of stuff on it, we want to export that stuff without changing it on the server', so making a new local filesystem is out). I get the impression from Evgeniy's blog that the intent may be to allow pluggable backends, so that pohmelfs can serve its data *either* from a local fs or from a key->value store, but I have no idea if this is actually true (yet).

I guess I should damn well try it out and set up some pohmelfs-using systems myself. Anything that can outdo NFS (and is Unixlike: sorry, SMB) I'm in favour of.

NLUUG: The bright future of Linux filesystems

Posted May 27, 2009 5:53 UTC (Wed) by shehjart (guest, #58602) [Link]

Consider it a plug, but LWN did a short story on our clustered file system a few weeks back. It'd be worth investigating for your needs if you're looking for a NFS replacement.

Heres a link:
"GlusterFS 2.0 released"
http://lwn.net/Articles/333397/

NLUUG: The bright future of Linux filesystems

Posted May 14, 2009 14:29 UTC (Thu) by job (guest, #670) [Link]

LogFS and NILFS are two new structured filesystems. It would be interesting to read a comparison of the two. POHMELFS and CRFS also have similarities and an overview of both is here.

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds