By Jonathan Corbet
October 29, 2012
In just a few days, a linux-kernel mailing list report of ext4 filesystem
corruption turned into a widely-distributed news story; the quality of ext4
and its maintenance, it seemed, was in doubt. Once the dust settled, the
situation turned out to be rather less grave than some had thought; the bug
in question only threatened a very small group of ext4 users using
non-default mount options. As this is being written, a fix is in testing
and should be making its way toward the mainline and stable kernels
shortly. The bug was
obscure, but there is value in looking at how it came about and the ripples
it caused.
The timeline
On October 23, user "Nix" was trying to help track down an NFS lock
manager crash when he ran into a little problem: the crash kept corrupting his filesystem, making the
debugging task rather more difficult than it would otherwise have been. He
reported the problem to the linux-kernel mailing list; he
also posted a warning for other LWN
readers. The ext4 developers moved quickly to find the problem, coming up
with a hypothesis within a few hours of the
initial report. Unfortunately, the hypothesis turned out to be wrong.
Before that became clear, though, a number of news outlets had posted
articles on the problem. LWN was not the first to do so ("first" is not at
the top of our list of priorities), but, late on the 24th, we, too, posted
an item about the issue. It quickly became
clear, though, that the original hypothesis did not hold water, and that
further investigation was in order. That investigation, as it turns out,
took a few days to play out.
Eric Sandeen eventually tracked the problem down to this
commit which found its way into the mainline during the 3.4 merge
window. That change was meant to be a cleanup, gathering the inode
allocation logic into a single function and removing some duplicated code.
The unintended result was to cause the inode bitmap to be modified outside of a
transaction, introducing unchecksummed data into the
journal. If the system crashed during that time, the next mount would
encounter checksum errors and refuse to play back the journal; the
filesystem was then seen as being corrupt.
The interesting thing is that, on most systems, this problem will never
come about because, on those systems, the journal checksums do not actually
exist. Journal checksumming is an optional feature, not enabled by
default, and, evidently, not widely used. Nix had turned on the feature
somewhat inadvertently; most other users do not turn it on at all, even if
they are aware it exists. Anybody who has journal checksums
turned off will not be affected by this bug, so very few ext4 users needed
to be concerned about potential data corruption.
As an interesting aside, checksums on the journal are a somewhat
problematic feature; as seen in this discussion
from 2008, it is not at all clear what the best response should be when
journal checksums fail to match. The journal checksum may not be
information that the system can reasonably act upon; indeed, as in this
case, it may create problems of its own.
Eric's patch appears to fix the problem;
corrupted journals that were easily observed before its application do not
happen afterward. There will naturally be a period of review and testing
before this change is merged into the mainline — nobody wants to create a
new problem through undue haste — but kernel
releases with a version of the fix (it has already been revised once) should be available to users in short
order. But most
users will not really care, since they were not affected by the problem in
the first place. They may care more about the plans to improve the
filesystem test suites so that regressions of this nature can be more
easily caught in the future.
Analysis
In retrospect, the media coverage of this bug was clearly out of proportion
to that bug's impact. One might attribute that to a desire for sensational
stories to drive traffic, and that may well be part of what was going on.
But there are a couple of other factors that are worth keeping in mind
before jumping to that judgment:
- Many media outlets employ editors and writers who, almost beyond
belief, are not trained in kernel programming. That makes it very
hard for them to understand what is really going on behind a
linux-kernel discussion even if they read that discussion rather than
basing a story on a single message received in a tip. They will see a
subject like "Apparent serious progressive ext4 data corruption,"
along with messages from prominent developers seemingly confirming the
problem, and that is what they have to go with. It is hard to blame
them for seeing a major story in this thread.
- Even those who understand linux-kernel discussions (LWN, in its arrogance,
places itself in this category) can be faced with an urgent choice. If
there were a data corruption bug in recent kernels, then we would
be beyond remiss to fail to warn our readers, many of whom run the
kernels in question. There comes a point where, in the absence of
better information, there is no alternative to putting something out
there.
The ext4 developers certainly cannot be faulted for the way this story
went. They did what conscientious developers do: they dropped everything
to focus on what appeared to be a serious regression affecting their
users. They might have avoided some of the splash by taking the discussion
private and not saying anything until they were certain of having found the
real problem, but that is not the way our community works. It is hard to
imagine that pushing development discussions out of the public view is
going to make things better in the long run.
Thus, one might conclude that we are simply going to see an occasional
episode like this, where a bug report takes on a life of its own and is
widely distributed before its impact is truly understood. Early reports of
software problems, arguably, should be treated like early software:
potentially interesting, but likely to be in need of serious review and
debugging. That's simply the world we live in.
A more serious concern may apply to the addition of features to the ext4
filesystem. Ext4 is viewed as the stable, production filesystem in the
Linux kernel, the one we're supposed to use while waiting for Btrfs to
mature. One might well question the addition of new features to this
filesystem, especially features that prove to be rarely used or that don't
necessarily play well with existing features. And, sure enough, Linux
filesystem developers have raised just this
kind of worry in the past. In the end, though, the evolution of ext4
is subject
to the same forces as the rest of the kernel; it will go in the directions
that its developers drive it. There is interest in enhancing ext4, so
new features will find their way in.
Before getting too worried about this prospect, though, it is worth
thinking about the history of ext4. This filesystem is heavily used with
all kinds of workloads; any problems lurking within will certainly emerge
to bite somebody. But problems that have affected real users have been
exceedingly rare and, even in this case, the number of affected users
appears to be countable without running out of fingers. Ext4, in other
words, has a long and impressive record of stability, and its developers
are determined to keep it that way; this bug can be viewed as the sort of
exception that proves the rule. One should never underestimate the value
of good backups, but, with ext4, the chances of having to actually use
those backups remain quite small.
(
Log in to post comments)