By Jonathan Corbet
October 24, 2012
Stable kernel updates are supposed to be just that — stable. But they are
not immune to bugs, as a recent ext4 filesystem problem has shown. In
short: ext4 users would be well advised to avoid versions 3.4.14, 3.4.15,
3.5.7, 3.6.2, and 3.6.3; they all contain a patch which can, in some
situations, cause filesystem corruption.
The problem, as explained in this note from Ted
Ts'o, has to do with how the ext4 journal is managed. In some
situations, unmounting the filesystem fails to truncate the journal,
leaving stale (but seemingly valid) data there. After a single
unmount/remount (or reboot) cycle little harm is done; some old
transactions just get replayed unnecessarily. If the filesystem is quickly
unmounted again, though, the journal can be left in a corrupted state; that
corruption will be helpfully replayed onto the filesystem at the next
mount.
Fixes are in the works. The ext4 developers are taking some time, though,
to be sure that the problem has been fully understood and completely fixed;
there are signs that the bug may have roots
far older than the patch that actually caused it to bite people. Once that
process is complete, there should be a new round of stable updates
(possibly even for 3.5, which is otherwise at end-of-life) and the world
will be safe for ext4 users again.
(Thanks are due to LWN reader "nix" who alerted
readers in the comments and reported the bug to the ext4 developers).
Update: Ted now thinks that his
initial diagnosis was incomplete at best; the problem is not as well
understood as it seemed. Stay tuned.
(
Log in to post comments)