LWN.net Logo

Improving ext4: bigalloc, inline data, and metadata checksums

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 1, 2011 19:36 UTC (Thu) by nix (subscriber, #2304)
In reply to: Improving ext4: bigalloc, inline data, and metadata checksums by walex
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

I shouldn't respond to this troll-bait, but nonetheless...

The big problem with 'ext4' is that its only reason to be is to allow Red Hat customers to upgrade in place existing systems, and what Red Hat wants, Red Hat gets (also because they usually pay for that and the community is very grateful).
Interesting. tytso wasn't working for RH when ext4 started up, and still isn't working for them now. So their influence must be more subtle.

I also see that I was making some sort of horrible mistake by installing ext4 on all my newer systems, but you never make clear what that mistake might have been.

In particular JFS should have been the "default" Linux filesystem instead of ext[23] for a long time. Not making JFS the default was probably the single worst strategic decision for Linux (but it can be argued that letting GKH near the kernel was even worse).
Ah, yeah. Because stable kernels, USB support, mentoring newbies, the driver core, -staging... all these things were bad.

I've been wracking my brains and I can't think of one thing Greg has done that has come to public knowledge and could be considered bad. So this looks like groundless personal animosity to me.


(Log in to post comments)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 1, 2011 19:41 UTC (Thu) by andresfreund (subscriber, #69562) [Link]

> I've been wracking my brains and I can't think of one thing Greg has done that has come to public knowledge and could be considered bad. So this looks like groundless personal animosity to me.
Also, uhm. Didn't he work for Suse?

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 2, 2011 11:35 UTC (Fri) by alankila (subscriber, #47141) [Link]

I dimly recall that the animosity originated from the work with udev, and the removal of devfs. Since I personally don't care one bit about this issue, I have hard time now reconstructing the relevant arguments, but my guess is that some people really hate the idea that a system needs more than just kernel to be useful.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 2, 2011 18:40 UTC (Fri) by nix (subscriber, #2304) [Link]

udev is prone to creating frothing-at-the-mouth even in otherwise reasonable people, due to the udev authors' patent lack of concern for backward compatibility. Twice now they've broken existing systems without so much as a by-your-leave: firstly with the massive migration of all system-provided state out of /etc/udev.d/rules into /lib/udev/rules: what, you customized them? sucks to be you, now you have to customize them before *building* udev, and more recently with the abrupt movement of /sbin/udevd into /lib/udev without even leaving behind a symlink! Oh, you were starting that at bootup and relying on it to be there? Sorry, we just broke your bootup, your own fault for not reading the release notes! Hope you don't need to downgrade!

(Yes, I read the release notes, so didn't fall into these traps, but FFS, at least the latter problem was trivial to work around -- one line in the makefile to drop a symlink in /sbin -- and they just didn't bother.)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 2, 2011 23:40 UTC (Fri) by walex (subscriber, #69836) [Link]

As to udev some people dislike smarmy shysters who replace well designed working subsystems seemingly for the sole reason of making a political landgrab, because the replacement has both more kernel complexity and more userland complexity and less stability.

The key features of devfs were that it would populate automatically /dev from the kernel with basic device files (major, minor) and then use a very simple userland daemon to add extra aliases as required.

It turns out that after several attempts to get it to work udev adds to /sys from inside the kernel exactly the same information, so there has been no migration of functionality from kernel to userspace:

$ ls -ld /dev/tty9
crw--w---- 1 root tty 4, 9 2011-11-28 14:03 /dev/tty9
$ cat /sys/class/tty/tty9/dev
4:9

And the userland part is also far more complex and unstable than devfsd ever was (for example devfs did not require cold start).

And udev is just the most shining example of a series of similar poor decisions (which however seem to have been improving a bit with time).

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 3, 2011 3:16 UTC (Sat) by raven667 (subscriber, #5198) [Link]

I'm not sure that is an accurate portrayal of what happened, on this planet at least. My recollection from the time is that there were fundamental technical problems with the devfs implementation which is why it was redone into udev. I think those problems were some inherent race conditions on device add/removal, plus concerns about how much policy about /dev file names, permissions, etc was hard coded into the kernel and unmodifyable by an end user or sysadmin. That is just my recollection.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 3, 2011 11:07 UTC (Sat) by nix (subscriber, #2304) [Link]

The latter is doubly ironic now that udev forbids you from changing the names given to devices by the kernel. (You can introduce new names, but you can't change the kernel's anymore.)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 3, 2011 4:04 UTC (Sat) by alankila (subscriber, #47141) [Link]

To your specific example: obviously the kernel is going to have some kind of (generated) name for a device, and to know the major/minor number pair which is the very thing that faciliates the communication between userspace and kernel... But udev is still controlling things like permissions and aliases for those devices where necessary.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 3, 2011 0:12 UTC (Sat) by walex (subscriber, #69836) [Link]

«tytso wasn't working for RH when ext4 started up, and still isn't working for them now. So their influence must be more subtle. »

Quite irrelevant: a lot of file system were somebody's hobby file systems, but they did not achieve prominence and instant integration into mainline even if rather alpha, and RedHat did not spend enormous amounts of resources quality assuring them to make them production ready either, and quality assurance is a pretty vital detail for file systems, as the Namesys people discovered.

Pointing to tytso is just misleading. Also because ext4 really was seeded by Lustre people before tytso became active on it in his role as ext3 curator (and in 2005, which is 5 years later than when JFS became available).

Similarly for BTRFS, it has been initiated by Oracle (who have an ext3 installed base), but its main appeal is still as the next inplace upgrade on the Red Hat installed base (thus the interest in trialing it in Fedora, where EL candidate stuff is mass tested), even if for once it is not just an extension of the ext line but has some interesting new angles.

But considering ext4 on its own is a partial view; one must consider the pre-existing JFS and XFS stability and robustness and performance, and from a technical point of view ext4 is not that interesting (euphemism) and its sole appeal is inplace upgrades, and the widest installed based for that is RedHat, and to a large extent that could have been said of ext3 too.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 3, 2011 0:52 UTC (Sat) by nix (subscriber, #2304) [Link]

So you're blaming the Lustre people now? You do realise Lustre is not owned by Red Hat, and never was?

And if you're claiming that btrfs is effectively RH-controlled merely because RH customers will benefit, then *everything* that happens to Linux must by your bizarre definition be RH-controlled. That's a hell of a conspiracy: so vague that the coconspirators don't even realise they're conspiring!

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Apr 13, 2012 19:34 UTC (Fri) by fragmede (subscriber, #50925) [Link]

I though *Oracle* was a/the big contributor to btrfs...

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 3, 2011 19:45 UTC (Sat) by tytso (subscriber, #9993) [Link]

Sure, and I've always been careful to give the Lustre folk credit for the work that they did between 2003 and 2006 extending ext3 to add support for delayed allocation (which JFS didn't have), multi-block allocation (which JFS didn't have) and extents (OK, JFS had extents).

But you can't have it both ways. If that code had been in use by paying Lustre companies, then it's hardly alpha code, wouldn't you agree?

And why did the Lustre developers at Clustrefs chose ext3? Because the engineers they hired knew ext3, since it was a community-supported distribution, whereas JFS was controlled by a core team that was all IBM'ers, and hardly anyone outside of IBM was available who knew JFS really well.

But as others have already pointed out, there was no grand conspiracy to pick ext2/3/4 over its competition. It won partially due to its installed base, and partially because of the availability of developers who understood it (and books written about it, etc., etc., etc.) The way you've been writing you seem to think there was some secret cabal (at Red Hat?) that made these decisions, and there was a "mistake" because they didn't chose your favorite file systems.

The reality is that file systems all have trade-offs, and what's good for some people are not so great for others. Take a look at some of the benchmarks at btrfs.boxacle.net; they're a bit old, but they are well done, and they show that across many different workloads at that time (2-3 years ago) there was no one single file system that was the best across all of the different workloads. So anyone who only uses a single workload, or a single hardware configuration, and tries to use that to prove that their favorite file system is the "best" is trying to sell you something, or who is a slashdot kiddie who has a fan-favorite file system. The reality is a lot more complicated than that, and it's not just about performance. (Truth be told, for many/most uses cases, the file system is not the bottleneck.) Issues like availability of engineers to support the file system in a commercial product, the maturity of the userspace support tools, ease of maintainability, etc. are at least as important if not more so.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 3, 2011 20:43 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

at the time ext3 became the standard, JFS and XFS had little support (single vendor) and were both 'glued on' to linux with heavy compatibility layers.

Add to this the fact that you did not need to reformat your system to use ext3 when upgrading, and the fact that ext3 became the standard (taking over from ext2, which was the prior standard) is a no-brainer, and no conspiracy.

In those days XFS would outperform ext3, but only in benchmarks on massive disk arrays (which were even more out of people's price ranges at that point then they are today)

XFS was scalable to high-end systems, but it's low-end performance was mediocre

looking at things nowdays, XFS has had a lot of continuous improvement and integration, both improving it's high-end performance and reliability, and improving it's low-end performance without loosing it's scalability. There are also more people, working for more companies supporting it, making it far less of a risk today, with far more in the way of upsides.

JFS has received very little attention after the initial code dump from IBM, and there is now nobody actively maintaining/improving it, so it really isn't a good choice going forward.

reiserfs had some interesting features and performance, but it suffered from some seriously questionably benchmarking (the one that turned me off to it entirely was a spectacular benchmarking test that reiserfs completed in 20 seconds that took several minutes on ext*, but then we discovered that reiserfs defaulted to a 30 second delay before writing everything to disk, so the entire benchmark was complete before any day started getting written to disk, after that I didn't trust anything that they claimed), and a few major problems (the fsck scrambling is a huge one). It was then abandoned by the developer in favor of the future reiserfs4, with improvements that were submitted being rejected as they were going to be part of the new, incompatible filesystem.

ext4 is in large part a new filesystem who's name just happens to be similar to what people are running, but it has now been out for several years, with developers who are responsive to issues, are a diverse set (no vendor lock-in or dependencies) and are willing to say where the filesystem is not the best choice.

btrfs is still under development (the fact that they don't yet have a fsck tool is telling), is making claims that seem too good to be true, and have already run into several cases where they have pathalogical behavior and have had to modify things significantly. I wouldn't trust it for anything other than non-critical personal use for another several years.

as a result, I am currently using XFS for the most part, but once I get a chance to do another round of testing, ext4 will probably join it. I have a number of systems that have significant numbers of disks, so XFS will probably remain in use.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 4, 2011 1:12 UTC (Sun) by nix (subscriber, #2304) [Link]

ext4 is in large part a new filesystem who's name just happens to be similar to what people are running
ext4 is ext3 with a bunch of new extensions (some incompatible): indeed, initially the ext4 work was going to be done to ext3, until Linus asked for it to be done in a newly-named clone of the code instead. It says a lot for the ext2 code and disk formats that they've been evolvable to this degree.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds