Ext4 data corruption trouble [Updated]
The problem, as explained in this note from Ted Ts'o, has to do with how the ext4 journal is managed. In some situations, unmounting the filesystem fails to truncate the journal, leaving stale (but seemingly valid) data there. After a single unmount/remount (or reboot) cycle little harm is done; some old transactions just get replayed unnecessarily. If the filesystem is quickly unmounted again, though, the journal can be left in a corrupted state; that corruption will be helpfully replayed onto the filesystem at the next mount.
Fixes are in the works. The ext4 developers are taking some time, though, to be sure that the problem has been fully understood and completely fixed; there are signs that the bug may have roots far older than the patch that actually caused it to bite people. Once that process is complete, there should be a new round of stable updates (possibly even for 3.5, which is otherwise at end-of-life) and the world will be safe for ext4 users again.
(Thanks are due to LWN reader "nix" who alerted readers in the comments and reported the bug to the ext4 developers).
Update: Ted now thinks that his
initial diagnosis was incomplete at best; the problem is not as well
understood as it seemed. Stay tuned.
