Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 23, 2013
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
Improving ext4: bigalloc, inline data, and metadata checksums
Posted Nov 30, 2011 0:32 UTC (Wed) by pr1268 (subscriber, #24648)
Thanks for the pointer, and thanks also to yoe's reply above. But, my music collection (currently over 10,000 files) has existed for almost four years, ever since I converted the entire collection from MP3 to OGG (via a homemade script which took about a week to run).1 (I've never converted from FLAC to OGG, although I do have a couple of FLAC files.) I never noticed any corruption in the OGG files until a few months ago, shortly after I did a clean OS re-install (Slackware 13.37) on bare disks (including copying the music files)2. I'm all too eager to blame the corruption on ext4 and/or LVM, since those were the only two things that changed immediately prior to the corruption, but you both bring up a good point that maybe I should dig a little deeper into finding the root cause before I jump to conclusions.
1 I've had this collection of (legitimately acquired) songs for years prior, even having it on NTFS back in my Win2000/XP days. I abandoned Windows (including NTFS) in August 2004, and my music collection was entirely MP3 format (at 320 kbit) since I got my first 200GB hard disk. After seeing the benefits of the OGG Vorbis format, I decided to switch.
2 I have four physical disks (volumes) in which I've set up PV set spanning across all disks for fast I/O performance. I'm not totally impressed at the performance—it is somewhat faster—but that's a whole other discussion.
Posted Nov 30, 2011 0:57 UTC (Wed) by yokem_55 (subscriber, #10498)
Posted Nov 30, 2011 2:11 UTC (Wed) by dskoll (subscriber, #1630)
I also had a very nasty experience with ext4. A server I built using ext4 suffered a power failure and the file system was completely toast after it powered back up. fsck threw hundreds of errors and I ended up rebuilding from scratch.
I have no idea if ext4 was the cause of the problem, but I've never seen that on an ext3 system. I am very nervous... possibly irrationally so, but I think I'll stick to ext3 for now.
Posted Nov 30, 2011 4:52 UTC (Wed) by ringerc (subscriber, #3071)
Write-back caching on volatile storage without careful use of write barriers and forced flushes *will* cause severe data corruption if the storage is cleared due to (eg) unexpected power loss.
Posted Nov 30, 2011 9:00 UTC (Wed) by Cato (subscriber, #7643)
Posted Nov 30, 2011 12:40 UTC (Wed) by dskoll (subscriber, #1630)
My system was using Linux Software RAID, so there wasn't a cheap RAID controller in the mix. You could be correct about the hard drives doing caching, but it seems odd that I've never seen this with ext3 but did with ext4. I am still hoping it was simply bad luck, bad timing, and writeback caching... but I'm also still pretty nervous.
Posted Nov 30, 2011 12:50 UTC (Wed) by dskoll (subscriber, #1630)
Ah... reading http://serverfault.com/questions/279571/lvm-dangers-and-caveats makes me think I was a victim of LVM and no write barriers. I've followed the suggestions in that article. So maybe I'll give ext4 another try.
Posted Nov 30, 2011 20:20 UTC (Wed) by walex (subscriber, #69836)
It is a very well known issue usually involving unaware sysadms and cheating developers.
Posted Nov 30, 2011 2:13 UTC (Wed) by nix (subscriber, #2304)
I'm quite willing to believe that bad RAM and the like can cause data corruption, but even when I was running ext4 on a machine with RAM so bad that you couldn't md5sum a 10Mb file three times and get the same answer thrice, I had no serious corruption (though it is true that I didn't engage in major file writing while the RAM was that bad, and I did get the occasional instances of bitflips in the page cache, and oopses every day or so).
Posted Nov 30, 2011 12:49 UTC (Wed) by tialaramex (subscriber, #21167)
To someone who isn't looking for RAM/ cache issues as the root cause, those often look just like filesystem corruption of whatever kind. They try to open a file, get an error saying it's corrupted. Or they run a program and it mysteriously crashes.
If you _already know_ you have bad RAM, then you say "Ha, bitflip in page cache" and maybe you flush a cache and try again. But if you've already begun to harbour doubts about Seagate disks, or Dell RAID controllers, or XFS then of course that's what you will tend to blame for the problem.
Posted Dec 1, 2011 19:23 UTC (Thu) by nix (subscriber, #2304)
Rare bitflips are normally going to be harmless or fixed up by e2fsck, one would hope. There may be places where a single bitflip, written back, toasts the fs, but I'd hope not. (The various fs fuzzing tools would probably have helped comb those out.)
Posted Nov 30, 2011 10:19 UTC (Wed) by Trou.fr (subscriber, #26289)
Posted Nov 30, 2011 15:35 UTC (Wed) by pr1268 (subscriber, #24648)
From that article: Mp3 to Ogg Ogg -q6 was required to achieve transparency against the (high-quality) mp3 with difficult samples.
I used -q8 (or higher) when transcoding with oggenc(1); I've done extensive testing by transcoding back-and-forth to different formats (including RIFF WAV) and have never noticed any decrease in audio quality or frequency response, even when measured with a spectrum analyzer. I do value your point, though.
Posted Dec 1, 2011 22:54 UTC (Thu) by job (guest, #670)
Posted Dec 10, 2011 1:04 UTC (Sat) by ibukanov (subscriber, #3942)
Lossy format conversion
Posted Dec 10, 2011 15:20 UTC (Sat) by corbet (editor, #1)
Posted Dec 12, 2011 2:54 UTC (Mon) by jimparis (subscriber, #38647)
You can't replace missing information, but you could still make something that sounds better -- in a subjective sense. For example, maybe the mp3 has harsh artifacts at higher frequencies that the ogg encoder would remove.
It could apply to lossy image transformations too. Consider this sample set of images.
An initial image is pixelated (lossy), and that result is then blurred (also lossy). Some might argue that the final result looks better than the intermediate one, even though all it did was throw away more information.
But I do agree that this is off-topic, and that such improvement is probably rare in practice.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds