The 2.5 kernel development process has put a strong emphasis on scalability
and performance issues. So it is somewhat interesting that the core Linux
filesystems - ext2 and ext3 - have seen relatively little scalability work
in 2.5. That is beginning to change, at least for ext2, but this work is
raising some interesting questions about what the role of these two
filesystems really is.
Alex Tomas has recently been working on performance bottlenecks in ext2.
His first concurrent block allocation patch
attacks the problem of allocating blocks within a filesystem. The current
ext2 code takes out the superblock lock before performing block allocation;
this means that only one thread can be trying to allocate space in a given
filesystem at a time. The first patch created a separate "allocation lock"
which protects the small piece of code which actually makes allocation
decisions; a later revision creates a
separate lock for each block group within the filesystem, thus reducing
lock contention further.
The patch was greeted with positive reviews. William Lee Irwin reported a throughput increase from
62 MB/s to 104 MB/s on a benchmark he ran, and exclaimed
"This patch is a godsend. Whoever's listening, please apply!.
Martin Bligh, instead, said "SDET on
my machine (16x NUMA-Q) has fallen in love with your patch, and has decided
to elope with it to a small desert island." Not bad for a patch
which is really a pretty straightforward exercise in finer-grained
The block allocation patch was quickly joined by a concurrent inode allocation patch and a distributed counters patch. None of these have
found their way into the mainline kernel yet, but they offer enough
performance benefits that they will likely get there eventually. Assuming
the block allocation patch can be coaxed back from its desert island
experience, that is.
A question was raised, however: is ext2 the right place for this sort of
work? ext2 is generally thought of as the relatively simple Linux
filesystem; ext3 is the place for fancy new stuff. There are a couple of
reasons why this sort of work tends to find its way into ext2 first,
One of those reasons is the simple fact that ext3 still has bigger scaling
problems. The ext3 filesystem is one of the few places in the Linux kernel
that still makes heavy use of the big kernel lock (BKL). As a result, ext3
does not scale well to large systems, and tweaking things like block
allocation will not help the real problem. Until the BKL dependency is
removed from ext3, most other performance work will not make much sense.
Removing the BKL is apparently a somewhat tricky job; at this point, it may
well not happen before 2.6 is released.
The other reason is that, large-systems scaling issues notwithstanding,
ext3 is developing into the default Linux filesystem. For most users,
there is little or no incentive to prefer ext2 over ext3; all it takes is
one power failure to make the advantages of a journaling filesystem clear.
So, as Daniel Phillips put it:
Ext2 is growing into the role of experimental filesystem; Ext3 is
now the stable filesystem. Hopefully, the experiments will make
Ext2 smaller, cleaner and at the same time, more powerful, over
time. Sort of like the role that RAMFS plays: besides being
useful, Ext2 should be thought of as a showcase for best filesystem
The role reversal, it seems, is nearly complete. Soon, it will be the ext2
users who are living on the bleeding edge.
to post comments)