LWN: Comments on "Responding to ext4 journal corruption" https://lwn.net/Articles/284037/ This is a special feed containing comments posted to the individual LWN article titled "Responding to ext4 journal corruption". en-us Tue, 07 Oct 2025 13:27:14 +0000 Tue, 07 Oct 2025 13:27:14 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Responding to ext4 journal corruption https://lwn.net/Articles/285351/ https://lwn.net/Articles/285351/ efexis <div class="FormattedComment"><pre> This is what first came to my mind, but if data has been written, but metadata saying what this data is gets discarded, the new data could be misinterpreted as what the previous metadata said it was (such as believing it to be more metadata pointing to blocks on the disk, but it's actually an image). I guess the solution here would be to zero any pointers to metadata first (or settings a 'corrupt' or 'deleted' flag on the metadata sector itself) and making sure that's reached the disk before writing the data. Of course this can slow things down as you have to write to the metadata block an extra time per update. I think the snapshotting way is the only way forward; if you never get rid of something until certain the new one works (ie, has completely reached the disc) then it doesn't matter what you do or when... you'll always have at least one working version. Large writes would start failing when your disc is nearing full, but with todays drive sizes, we're more concerned with losing 500G of data than filling it. </pre></div> Mon, 09 Jun 2008 03:26:14 +0000 Responding to ext4 journal corruption https://lwn.net/Articles/285314/ https://lwn.net/Articles/285314/ Duncan <div class="FormattedComment"><pre> That's probably one of the big reasons I've found reiserfs (3) so stable here, at least after ordered-by-default hit the tree. I ran a system for some time with an annoying memory bus error issue (generic memory rated a speed notch higher than it should have been, a BIOS update eventually let me limit memory speed by a notch, after which it was absolutely stable) that would crash the system with MCE errors relatively frequently. 100% reiserfs, no UPS, no problems after ordered-by-default, tho I certainly had some previous to that. I'm running the same system but with a memory and CPU upgrade now, and with reiserfs on mdp/kernel RAID-6, system directly on one RAID-6 partition (with a second for a backup system image), everything else on LVM2 on another one. Despite the lack of barriers on the stack as explained in last week's barrier article, and despite continuing to run without a UPS and having occasional power outages that often trigger a RAID-6 rebuild, I've been VERY happy with system integrity. Duncan </pre></div> Sat, 07 Jun 2008 17:20:37 +0000 Responding to ext4 journal corruption https://lwn.net/Articles/285143/ https://lwn.net/Articles/285143/ anton <a rel="nofollow" href="http://www.complang.tuwien.ac.at/anton/lfs/">I believe in the superiority of copy-on-write file systems</a> over journaling file systems, but problems such as the one discussed can happen in copy-on-write file systems like Btrfs, too, unless they are carefully implemented; i.e., they must not reuse freed blocks until one or two checkpoints have made it to the disk (two, if you want to survive if the last checkpoint becomes unreadable). Thu, 05 Jun 2008 18:35:56 +0000 Responding to ext4 journal corruption https://lwn.net/Articles/284621/ https://lwn.net/Articles/284621/ jlokier <div class="FormattedComment"><pre> Another way, which doesn't pin blocks and prevent their reallocation, is to keep track of dependencies in the journal: transaction 3 _depends_ on transaction 2, because it uses blocks which are repurposed in transaction 2. So there should be a note in transaction 3 saying "I depend on T2". During replay, if transaction 2 fails due to bad checksum, transaction 3 will be rejected due to the dependency. Transaction 4 may be fine, etc. (The same dependencies can be converted to finer-grained barriers too - e.g. to optimise ext4 on software RAID.) Some RAM is needed to keep track of the dependencies, until commits are known to have hit the platters. If it's a problem, this can be bounded with some hashed approximation akin to a Bloom filter. </pre></div> Sun, 01 Jun 2008 22:09:57 +0000 Responding to ext4 journal corruption https://lwn.net/Articles/284522/ https://lwn.net/Articles/284522/ masoncl <div class="FormattedComment"><pre> Reiserfsv3 and jbd both use write ahead logging schemes, and so they solve very similar problems. Reiserfs keeps tracks in ram of which blocks are pinned and not available for allocations, while jbd uses these revoke records. Keeping track in ram has performance implications, but it is certainly possible. </pre></div> Fri, 30 May 2008 13:43:34 +0000 Responding to ext4 journal corruption https://lwn.net/Articles/284493/ https://lwn.net/Articles/284493/ jzbiciak <Blockquote><I><OL><LI>A file is created, with its associated metadata.</LI> <LI>That file is then deleted, and its metadata blocks are released.</LI> <LI>Some other file is extended, with the newly-freed metadata blocks being reused as data blocks.</LI></OL></I></BLOCKQUOTE> <P> It seems that if you defer releasing metadata blocks in the in-memory notion of "available space" until the transaction releasing them is well and truly committed (rather than "sent to the journal"), you prevent '3' from ever happening.</P> <P>In fact, the general issue seems to be related to storage repurposing. For example, consider blocks freed from file A get allocated to file B. If data for B gets written to those blocks but the transactions reassigning those blocks get corrupted across a crash, then file A would hold contents intended for file B.</P> <P>Thus, it seems prudent in<TT> data=ordered </TT>mode to prevent the allocator from reallocating recently freed blocks until the metadata indicating that those blocks are actually free is actually committed. I have no idea how difficult to implement that might be, but it <I>is</I> something that only needs to be tracked in the in-memory notion of "available space."</P> <P>Will this degrade the quality of allocations? It might for nearly full filesystems or filesystems with a lot of churn, but for filesystems that are far from full, I doubt it would have any measurable impact whatsoever. There will be some pool of blocks from files recently getting deleted or truncated that won't be available for reallocation immediately.</P> <P>Anyone see any holes in this?</P> Fri, 30 May 2008 06:03:26 +0000 Responding to ext4 journal corruption https://lwn.net/Articles/284313/ https://lwn.net/Articles/284313/ nix <div class="FormattedComment"><pre> Writing garbage into the journal is quite easy, too. All it takes is for the disk to forget a single seek after a legitimate journal write, and it'll write something into the journal which was supposed to go elsewhere. (I've seen this on disks running live systems on ext3 for huge banks. The banks were not very happy, because the sysadmins simply unplugged the disk array after the disk errors: so the filesystem was unclean, the journal was replayed... and oops, that's sprayed quite a lot of garbage into the fs, because a multimegabyte logfile write had landed in the journal, and all of that was misinterpreted as metadata. That specific case, in which the blocks look nothing like journal blocks at all, was plugged in e2fsprogs 1.40, but the bank was using a version of RHEL that was still on 1.35...) </pre></div> Thu, 29 May 2008 13:23:34 +0000