The 2.4.20 ext3 corruption bug
[Posted December 11, 2002 by corbet]
Shortly after the release of the 2.4.20 stable kernel, word got out that
there was a bug which could lead to corruption on ext3 filesystems. This
particular bug will not affect all that many users: to be bitten, one must
(1) use the non-default
data=journal option, and
(2) unmount the filesystem after making changes, but before those
changes are synced to disk. Nonetheless, filesystem corruption is not a
good feature to include in a stable kernel release.
2.4.20 users who wish to be protected from this bug should apply this patch from Andrew Morton. Andrew also
includes some information on how the bug came to be.
The trouble, it seems, comes from a longstanding confusion between two operations:
- Flushing data to a filesystem to get it out of main memory, and
- Fully synchronizing a filesystem to get it into a consistent, current
state on disk.
The write_super() filesystem operation once performed the second
operation above. A full sync, however, requires waiting for all of the I/O
operations to complete. Most of the time, that is not what the kernel
wants to do; it simply wants to get dirty buffers headed toward the disk
sometime soon. So the ext3 write_super() method was made
asynchronous, as a way of increasing performance. After another tweak went
in, however, the lack of synchronization allowed the filesystem to be
unmounted before the data actually made it to disk. And that, of course,
led to corruption.
The solution is to properly separate the two operations. So Andrew's patch
adds a new sync_fs() operation; it writes everything to the
filesystem, and does not return until the job is done. With this patch in
place, write_super() can be safely made into an asynchronous flush
operation; kernel code which needs to be sure that everything has been
written out will use sync_fs() instead.
Andrew has also posted a version of the
patch for the 2.5 kernel. It is a more extensive change (though the
patch is still small) in that it tries to improve performance by getting
all sync operations going before waiting for any of them.
(
Log in to post comments)