LWN.net Logo

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

IBM developerWorks has posted an introduction to the NilFS2 and exofs filesystems. "An interesting aspect of NiLFS(2) is its technique of continuous snap-shotting. As NILFS is log structured, new data is written to the head of the log while old data still exists (until it's necessary to garbage-collect it). Because the old data is there, you can step back in time to inspect epochs of the file system. These epochs are called checkpoints in NiLFS(2) and are an integral part of the file system. NiLFS(2) creates these checkpoints as changes are made, but you can also force a checkpoint to occur."
(Log in to post comments)

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

Posted Nov 6, 2009 14:32 UTC (Fri) by johnflux (subscriber, #58833) [Link]

Very interesting - what was old is now new again.

Ext3 etc try to avoid fragmentation, and so write (semi)randomly to the disk. But if you take it as an assumption that you will never delete data (e.g timemachine type disk) then a log-based system is incredibly simple and powerful, letting you roll back to any point in time.

A couple of things that I don't get though (I don't know much about filesystems)

1) Writing 2 files at once will basically interleave the two files on the disk, right?

2) Writing a file that never changes.. what happens when your logs wraps around and goes to the beginning again, and wants to overwrite the file that hasn't changed? I suppose you could skip over that, and treat the very-old file and suddenly being very-new again.

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

Posted Nov 6, 2009 17:00 UTC (Fri) by efexis (guest, #26355) [Link]

1 - At a guess, this may happen, although depending on how it's implemented, the effects may be reduced by delayed allocation & writeback, which can lump together writes to occur all at the same time and contiguously, rather than as they happen which would interleave the files.

2 - you don't write over stuff that's still in use, but usually set a data retention policy to say how long to keep snapshots for (or what resolution of snapshots to keep - eg, every week take a snapshot that will keep for a year, but all other snapshots taking during the week can be overwritten after 2 weeks). If you need to write something to disk but all of the space is covered by current data or snapshots, then you'd simply get an out of disk space error as you would with any other filesystem, so it's not wise to set an overly optimistic data retention policy.

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

Posted Nov 12, 2009 4:29 UTC (Thu) by Simetrical (guest, #53439) [Link]

1) More than that: writing a file and then changing one block (or appending
a block) will put different parts of the file in totally different places.
A log-structured filesystem is more or less guaranteed to be 100% fragmented
for files larger than a few blocks or files that change often. This is why
it's really only suitable for SSDs, where fragmentation borders on
irrelevant.

2) Log-structured filesystems devote much ingenuity to garbage collection.
They delete or reposition old entries intelligently as needed. Either the
file will be skipped, or moved ahead. (I would assume skipped, but I don't
know much of anything about log-structured filesystems.)

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

Posted Nov 19, 2009 15:46 UTC (Thu) by forthy (guest, #1525) [Link]

Actually, optimizing for read is dead easy. Log structured file systems are by nature also copy on write, so just read the files in and write them ("in place") out in the order you want them to be read (e.g. the files loaded at boot or login time). Result: All nicely defragmented and in the right order so that readahead works perfectly.

The main recipy to defragmentation is delayed allocation, anyways, and that works the same for log structured as for ext3. The files that are going to end up pretty fragmented are logs, or similar files where processing takes long and which are written in smaller chunks. These files have to be cleaned up in the collecting process.

A log structured file system that keeps kryptographic strong checksums/hashes for all files can easily remove redundand copies by looking at the hashes only. When idle, it can copy the most fragmented files and the ones from sparcely used areas of the disk, to get free space. Keeping strong hashes is only easy for append-only files, so randomly written files like database files still won't be compressed.

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

Posted Nov 12, 2009 11:12 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

http://lwn.net/Articles/353411/ looks into those questions in some detail.

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

Posted Nov 15, 2009 16:44 UTC (Sun) by anton (guest, #25547) [Link]

1) Stuff written between checkpoints can be arranged for fast reading. So if the files are written completely between two checkpoints, there is no need to interleave them. If the files are both written across many checkpoints, they will be fragmented unless some other optimization is made.

2) If you keep all checkpoints, the log does not wrap. When you reach the end of the disk, you are out of space. OTOH, if you only keep some checkpoints, some space is likely to become free. The classical solution in log-structured file systems was to have a cleaner that collects all the live data from a segment and copies it to another segment; the cleaner could also be used to defragment files that were fragmented across multiple checkpoints.

My impression is that classical log-structured file systems failed to become popular because of the cleaner. One claimed advantage of LFSs was writing speed, but clustering sped up traditional file systems, and the cleaner reduced the writing speed of LFSs substantially in the usual case (mostly-full disks). Nowadays copy-on-write filesystems combine the copy-on-write benefits of classical LFSs with the free-block-based disk allocation of clustered file systems and its advantages: no need for a cleaner and ways to avoid fragmentation in most of the usual cases.

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

Posted Nov 6, 2009 16:54 UTC (Fri) by trasz (guest, #45786) [Link]

How is it different from simply creating filesystem snapshots every few minutes and removing the oldest ones?

Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks)

Posted Nov 6, 2009 18:57 UTC (Fri) by drag (subscriber, #31333) [Link]

That's what is going on.

But you make it sound easy, but it is not. The trick is to use up as much
disk space as possible to store as much history as possible without
degrading performance or risking data loss.

There are lots of attempts at logging-based file systems in the past, but
they all hit a wall when they run out of disk space and need to do garbage
collection efficiently. It is very difficult to get right.

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds