User: Password:
Subscribe / Log in / New account

Improving ext4: bigalloc, inline data, and metadata checksums

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Nov 30, 2011 0:32 UTC (Wed) by pr1268 (subscriber, #24648)
In reply to: Improving ext4: bigalloc, inline data, and metadata checksums by bpepple
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

Thanks for the pointer, and thanks also to yoe's reply above. But, my music collection (currently over 10,000 files) has existed for almost four years, ever since I converted the entire collection from MP3 to OGG (via a homemade script which took about a week to run).1 (I've never converted from FLAC to OGG, although I do have a couple of FLAC files.) I never noticed any corruption in the OGG files until a few months ago, shortly after I did a clean OS re-install (Slackware 13.37) on bare disks (including copying the music files)2. I'm all too eager to blame the corruption on ext4 and/or LVM, since those were the only two things that changed immediately prior to the corruption, but you both bring up a good point that maybe I should dig a little deeper into finding the root cause before I jump to conclusions.

1 I've had this collection of (legitimately acquired) songs for years prior, even having it on NTFS back in my Win2000/XP days. I abandoned Windows (including NTFS) in August 2004, and my music collection was entirely MP3 format (at 320 kbit) since I got my first 200GB hard disk. After seeing the benefits of the OGG Vorbis format, I decided to switch.

2 I have four physical disks (volumes) in which I've set up PV set spanning across all disks for fast I/O performance. I'm not totally impressed at the performance—it is somewhat faster—but that's a whole other discussion.

(Log in to post comments)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Nov 30, 2011 0:57 UTC (Wed) by yokem_55 (subscriber, #10498) [Link]

I would also run smartctl -l error to see if your hard drives are bugging out and maybe even run memtest86+ overnight to see if you are having memory errors. Wierd, widespread data (with metadata intact) corruption in my experience tends to be more hardware related than anything else.

ext4 experience

Posted Nov 30, 2011 2:11 UTC (Wed) by dskoll (subscriber, #1630) [Link]

I also had a very nasty experience with ext4. A server I built using ext4 suffered a power failure and the file system was completely toast after it powered back up. fsck threw hundreds of errors and I ended up rebuilding from scratch.

I have no idea if ext4 was the cause of the problem, but I've never seen that on an ext3 system. I am very nervous... possibly irrationally so, but I think I'll stick to ext3 for now.

ext4 experience

Posted Nov 30, 2011 4:52 UTC (Wed) by ringerc (subscriber, #3071) [Link]

The usual culprit in those sorts of severe corruption or loss cases is aggressive write-back caching without battery backup. Some cheap RAID controllers will let you enable write-back caching without a BBU, and some HDDs support it too.

Write-back caching on volatile storage without careful use of write barriers and forced flushes *will* cause severe data corruption if the storage is cleared due to (eg) unexpected power loss.

ext4 experience

Posted Nov 30, 2011 9:00 UTC (Wed) by Cato (subscriber, #7643) [Link]

You are right about battery backup. Every modern hard disk uses writeback caching, and some of them make it hard to ensure that the cache is flushed when the kernel wants to ensure a write barrier is implemented. The size of hard disk caches (32 MB typically) and the use of journalling filesystems (concentrating key metadata writes in journal blocks) can mean that a power loss or hard crash loses a large amount of filesystem metadata.

ext4 experience

Posted Nov 30, 2011 12:40 UTC (Wed) by dskoll (subscriber, #1630) [Link]

My system was using Linux Software RAID, so there wasn't a cheap RAID controller in the mix. You could be correct about the hard drives doing caching, but it seems odd that I've never seen this with ext3 but did with ext4. I am still hoping it was simply bad luck, bad timing, and writeback caching... but I'm also still pretty nervous.

ext4 experience

Posted Nov 30, 2011 12:50 UTC (Wed) by dskoll (subscriber, #1630) [Link]

Ah... reading makes me think I was a victim of LVM and no write barriers. I've followed the suggestions in that article. So maybe I'll give ext4 another try.

ext4 experience

Posted Nov 30, 2011 20:20 UTC (Wed) by walex (subscriber, #69836) [Link]

You have been wishing for O_PONIES!

It is a very well known issue usually involving unaware sysadms and cheating developers.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Nov 30, 2011 2:13 UTC (Wed) by nix (subscriber, #2304) [Link]

Quite so. I've been using ext4 atop LVM (atop raid1 md, raid5 md, and Areca hardware RAID) for many years, and have never encountered a single instance of fs corruption which fsck could not repair -- and only one severe enough to prevent mounting which was not attributable to abrupt powerdowns, and *that* was caused by a panic at the end of a suspend, and e2fsck fixed it.

I'm quite willing to believe that bad RAM and the like can cause data corruption, but even when I was running ext4 on a machine with RAM so bad that you couldn't md5sum a 10Mb file three times and get the same answer thrice, I had no serious corruption (though it is true that I didn't engage in major file writing while the RAM was that bad, and I did get the occasional instances of bitflips in the page cache, and oopses every day or so).


Posted Nov 30, 2011 12:49 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

"occasional instances of bitflips in the page cache"

To someone who isn't looking for RAM/ cache issues as the root cause, those often look just like filesystem corruption of whatever kind. They try to open a file, get an error saying it's corrupted. Or they run a program and it mysteriously crashes.

If you _already know_ you have bad RAM, then you say "Ha, bitflip in page cache" and maybe you flush a cache and try again. But if you've already begun to harbour doubts about Seagate disks, or Dell RAID controllers, or XFS then of course that's what you will tend to blame for the problem.


Posted Dec 1, 2011 19:23 UTC (Thu) by nix (subscriber, #2304) [Link]

This does depend on how bad the RAM was. The RAM on this machine was so bad that the fs was not the only thing misbehaving by any means.

Rare bitflips are normally going to be harmless or fixed up by e2fsck, one would hope. There may be places where a single bitflip, written back, toasts the fs, but I'd hope not. (The various fs fuzzing tools would probably have helped comb those out.)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Nov 30, 2011 10:19 UTC (Wed) by (subscriber, #26289) [Link]

Not related to the current discussion : I hope you are aware that transcoding your MP3 collection to Vorbis only decreased their audio quality :

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Nov 30, 2011 15:35 UTC (Wed) by pr1268 (subscriber, #24648) [Link]

From that article: Mp3 to Ogg Ogg -q6 was required to achieve transparency against the (high-quality) mp3 with difficult samples.

I used -q8 (or higher) when transcoding with oggenc(1); I've done extensive testing by transcoding back-and-forth to different formats (including RIFF WAV) and have never noticed any decrease in audio quality or frequency response, even when measured with a spectrum analyzer. I do value your point, though.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 1, 2011 22:54 UTC (Thu) by job (guest, #670) [Link]

Just to clarify for everyone (who perhaps stumbles in via a web search): converting from mp3 to ogg, or indeed any time you apply lossy compression to something already lossy compressed, can only make the quality worse. The best case here is "at least not audibly worse".

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 10, 2011 1:04 UTC (Sat) by ibukanov (subscriber, #3942) [Link]

When one approximate another approximation it is possible the result will be closer to the original than the initial approximation. So in theory one can get better result with MP3->OGG conversion. For this reason if tests show that people cannot detect the difference with the *properly* done conversion, then I do not see how one can claim that it can only made the quality worse.

Lossy format conversion

Posted Dec 10, 2011 15:20 UTC (Sat) by corbet (editor, #1) [Link]

Pretty far off-topic, but: it is a rare situation indeed where the removal of information will improve the fidelity of a signal. One might not be able to hear the difference, but I have a hard time imagining how conversion between lossy formats could do anything but degrade the quality. You can't put back something that the first lossy encoding took out, but you can certainly remove parts of the signal that the first encoding preserved.

Lossy format conversion

Posted Dec 12, 2011 2:54 UTC (Mon) by jimparis (subscriber, #38647) [Link]

You can't replace missing information, but you could still make something that sounds better -- in a subjective sense. For example, maybe the mp3 has harsh artifacts at higher frequencies that the ogg encoder would remove.

It could apply to lossy image transformations too. Consider this sample set of images. An initial image is pixelated (lossy), and that result is then blurred (also lossy). Some might argue that the final result looks better than the intermediate one, even though all it did was throw away more information.

But I do agree that this is off-topic, and that such improvement is probably rare in practice.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds