Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Posted Nov 29, 2011 23:44 UTC (Tue) by pr1268 (guest, #24648)Parent article: Improving ext4: bigalloc, inline data, and metadata checksums
> It is solid and reliable
I'm not so sure about that; I've suffered data corruption in a stand-alone ext4 filesystem with a bunch of OGG Vorbis files—occasionally ogginfo(1) reports corrupt OGG files. Fortunately I have backups.
I'm going back to ext3 at the soonest opportunity. FWIW I'm using a multi-disk LVM setup—I wonder if that's the culprit?
Posted Nov 29, 2011 23:49 UTC (Tue)
by yoe (guest, #25743)
[Link] (10 responses)
Try to nail down whether your problem is LVM, one of your disks dying, or ext4, before changing things like that. Otherwise you'll be debugging for a long time to come...
Posted Nov 30, 2011 4:49 UTC (Wed)
by ringerc (subscriber, #3071)
[Link] (8 responses)
I recently had a batch of disks in a backup server start eating data because of a HDD firmware bug. It does happen.
Posted Nov 30, 2011 8:29 UTC (Wed)
by hmh (subscriber, #3838)
[Link]
Posted Nov 30, 2011 12:02 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (6 responses)
Screams RAM or cache fault to me. It's that word "occasionally" which does it. Bugs tend to be systematic. Their symptoms may be bizarre, but there's usually something consistent about them, because after all someone has specifically (albeit accidentally) programmed the computer to do exactly whatever it was that happened. Even the most subtle Heisenbug will have some sort of pattern to it.
Yoe should be especially suspicious of their "blame ext4" idea if this "corruption" is one or two corrupted bits rather than big holes in the file. Disks don't tend to lose individual bits. Disk controllers don't tend to lose individual bits. Filesystems don't tend to lose individual bits. These things all deal in blocks, when they lose something they will tend to lose really big pieces.
But dying RAM, heat-damaged CPU cache, or a serial link with too little margin of error, those lose bits. Those are the places to look when something mysteriously becomes slightly corrupted.
Low-level network protocols often lose bits. But because there are checksums in so many layers you won't usually see this in a production system even when someone has goofed (e.g. not implemented Ethernet checksums at all) because the other layers act as a safety net.
Posted Nov 30, 2011 12:44 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link]
Posted Nov 30, 2011 15:42 UTC (Wed)
by pr1268 (guest, #24648)
[Link] (4 responses)
The corruption I was getting was not merely "one or two bits" but rather a hole in the OGG file big enough to cause an audible "skip" in the playback—large enough to believe it was a whole block disappearing from the filesystem. Also, the discussion of write barriers came up; I have noatime,data=ordered,barrier=1 as mount options for this filesystem in my /etc/fstab file—I'm pretty sure those are the "safe" defaults (but I could be wrong).
Posted Nov 30, 2011 17:31 UTC (Wed)
by rillian (subscriber, #11344)
[Link] (3 responses)
That means that a few bit errors will cause the decoder to drop ~100 ms of audio at a time, and tools will report this as 'hole in data'. To see if it's disk or filesystem corruption, look for pages of zeros in a hexdump around where the glitch is.
Posted Dec 1, 2011 3:17 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (2 responses)
Posted Dec 1, 2011 10:07 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link]
Posted Dec 1, 2011 18:25 UTC (Thu)
by rillian (subscriber, #11344)
[Link]
The idea with the Ogg checksums was to protect the listener's ears (and possibly speakers) from corrupt output. It's also nice to have a built-in check for data corruption in your archives, which is working as designed here.
What you said is valid for video, because we're more tolerant of high frequency visual noise, and because the extra data dimensions and longer prediction intervals mean you can get more useful information from a corrupt frame than you do with audio. Making the checksum optional for the packet data is one of the things we'd do if we ever revised the Ogg format.
Posted Dec 2, 2011 22:09 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
That's not shotgun debugging (and not what the Jargon File calls it). The salient property of a shotgun isn't that it makes radical changes, but that it makes widespread changes. So you hit what you want to hit without aiming at it.
Shotgun debugging is trying lots of little things, none that you particularly believe will fix the bug.
In this case, the fallback to ext3 is fairly well targeted: the problem came contemporaneously with this one major and known change to the system, so it's not unreasonable to try undoing that change.
The other comments give good reason to believe this is not the best way forward, but it isn't because it's shotgun debugging.
There must be a term for the debugging mistake in which you give too much weight to the one recent change you know about in the area; I don't know what it is. (I've lost count of how many people accused me of breaking their Windows system because after I used it, there was a Putty icon on the desktop and something broke soon after that).
Posted Nov 30, 2011 0:00 UTC (Wed)
by bpepple (subscriber, #50705)
[Link] (17 responses)
Posted Nov 30, 2011 0:32 UTC (Wed)
by pr1268 (guest, #24648)
[Link] (16 responses)
Thanks for the pointer, and thanks also to yoe's reply above. But, my music collection (currently over 10,000 files) has existed for almost four years, ever since I converted the entire collection from MP3 to OGG (via a homemade script which took about a week to run).1 (I've never converted from FLAC to OGG, although I do have a couple of FLAC files.) I never noticed any corruption in the OGG files until a few months ago, shortly after I did a clean OS re-install (Slackware 13.37) on bare disks (including copying the music files)2. I'm all too eager to blame the corruption on ext4 and/or LVM, since those were the only two things that changed immediately prior to the corruption, but you both bring up a good point that maybe I should dig a little deeper into finding the root cause before I jump to conclusions. 1 I've had this collection of (legitimately acquired) songs for years prior, even having it on NTFS back in my Win2000/XP days. I abandoned Windows (including NTFS) in August 2004, and my music collection was entirely MP3 format (at 320 kbit) since I got my first 200GB hard disk. After seeing the benefits of the OGG Vorbis format, I decided to switch. 2 I have four physical disks (volumes) in which I've set up PV set spanning across all disks for fast I/O performance. I'm not totally impressed at the performance—it is somewhat faster—but that's a whole other discussion.
Posted Nov 30, 2011 0:57 UTC (Wed)
by yokem_55 (subscriber, #10498)
[Link] (9 responses)
Posted Nov 30, 2011 2:11 UTC (Wed)
by dskoll (subscriber, #1630)
[Link] (5 responses)
I also had a very nasty experience with ext4. A server I built using ext4 suffered a power failure and the file system was completely toast after it powered back up. fsck threw hundreds of errors and I ended up rebuilding from scratch.
I have no idea if ext4 was the cause of the problem, but I've never seen that on an ext3 system. I am very nervous... possibly irrationally so, but I think I'll stick to ext3 for now.
Posted Nov 30, 2011 4:52 UTC (Wed)
by ringerc (subscriber, #3071)
[Link] (3 responses)
Write-back caching on volatile storage without careful use of write barriers and forced flushes *will* cause severe data corruption if the storage is cleared due to (eg) unexpected power loss.
Posted Nov 30, 2011 9:00 UTC (Wed)
by Cato (guest, #7643)
[Link] (2 responses)
Posted Nov 30, 2011 12:40 UTC (Wed)
by dskoll (subscriber, #1630)
[Link] (1 responses)
My system was using Linux Software RAID, so there wasn't a cheap RAID controller in the mix. You could be correct about the hard drives doing caching, but it seems odd that I've never seen this with ext3 but did with ext4. I am still hoping it was simply bad luck, bad timing, and writeback caching... but I'm also still pretty nervous.
Posted Nov 30, 2011 12:50 UTC (Wed)
by dskoll (subscriber, #1630)
[Link]
Ah... reading http://serverfault.com/questions/279571/lvm-dangers-and-caveats makes me think I was a victim of LVM and no write barriers. I've followed the suggestions in that article. So maybe I'll give ext4 another try.
Posted Nov 30, 2011 20:20 UTC (Wed)
by walex (guest, #69836)
[Link]
It is a very well known issue usually involving unaware sysadms and cheating developers.
Posted Nov 30, 2011 2:13 UTC (Wed)
by nix (subscriber, #2304)
[Link] (2 responses)
I'm quite willing to believe that bad RAM and the like can cause data corruption, but even when I was running ext4 on a machine with RAM so bad that you couldn't md5sum a 10Mb file three times and get the same answer thrice, I had no serious corruption (though it is true that I didn't engage in major file writing while the RAM was that bad, and I did get the occasional instances of bitflips in the page cache, and oopses every day or so).
Posted Nov 30, 2011 12:49 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
To someone who isn't looking for RAM/ cache issues as the root cause, those often look just like filesystem corruption of whatever kind. They try to open a file, get an error saying it's corrupted. Or they run a program and it mysteriously crashes.
If you _already know_ you have bad RAM, then you say "Ha, bitflip in page cache" and maybe you flush a cache and try again. But if you've already begun to harbour doubts about Seagate disks, or Dell RAID controllers, or XFS then of course that's what you will tend to blame for the problem.
Posted Dec 1, 2011 19:23 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Rare bitflips are normally going to be harmless or fixed up by e2fsck, one would hope. There may be places where a single bitflip, written back, toasts the fs, but I'd hope not. (The various fs fuzzing tools would probably have helped comb those out.)
Posted Nov 30, 2011 10:19 UTC (Wed)
by Trou.fr (subscriber, #26289)
[Link] (5 responses)
Posted Nov 30, 2011 15:35 UTC (Wed)
by pr1268 (guest, #24648)
[Link] (4 responses)
From that article: Mp3 to Ogg Ogg -q6 was required to achieve transparency against the (high-quality) mp3 with difficult samples. I used -q8 (or higher) when transcoding with oggenc(1); I've done extensive testing by transcoding back-and-forth to different formats (including RIFF WAV) and have never noticed any decrease in audio quality or frequency response, even when measured with a spectrum analyzer. I do value your point, though.
Posted Dec 1, 2011 22:54 UTC (Thu)
by job (guest, #670)
[Link] (3 responses)
Posted Dec 10, 2011 1:04 UTC (Sat)
by ibukanov (subscriber, #3942)
[Link] (2 responses)
Posted Dec 10, 2011 15:20 UTC (Sat)
by corbet (editor, #1)
[Link] (1 responses)
Posted Dec 12, 2011 2:54 UTC (Mon)
by jimparis (guest, #38647)
[Link]
You can't replace missing information, but you could still make something that sounds better -- in a subjective sense. For example, maybe the mp3 has harsh artifacts at higher frequencies that the ogg encoder would remove.
It could apply to lossy image transformations too. Consider this sample set of images.
An initial image is pixelated (lossy), and that result is then blurred (also lossy). Some might argue that the final result looks better than the intermediate one, even though all it did was throw away more information.
But I do agree that this is off-topic, and that such improvement is probably rare in practice.
Posted Nov 30, 2011 8:50 UTC (Wed)
by ebirdie (guest, #512)
[Link]
Lesson learned: it pays to keep data on smaller volumes although it is very very tempting to stuff data to ever bigger volumes and postpone the headache in splitting and managing smaller volumes.
Posted Nov 30, 2011 8:57 UTC (Wed)
by Cato (guest, #7643)
[Link]
This may help: http://serverfault.com/questions/279571/lvm-dangers-and-c...
Posted Nov 30, 2011 21:01 UTC (Wed)
by walex (guest, #69836)
[Link] (58 responses)
But the main issue is not that, by all accounts 'ext4' is quite reliable (when on a properly setup storage system and properly used by applications).
The big problem with 'ext4' is that its only reason to be is to allow Red Hat customers to upgrade in place existing systems, and what Red Hat wants, Red Hat gets (also because they usually pay for that and the community is very grateful).
Other than that new "typical" systems almost only JFS and XFS make sense (and perhaps in the distant future BTRFS).
In particular JFS should have been the "default" Linux filesystem instead of ext[23] for a long time. Not making JFS the default was probably the single worst strategic decision for Linux (but it can be argued that letting GKH near the kernel was even worse). JFS is still probably (by a significant margin) the best ''all-rounder'' filesystem (XFS beats it in performance only on very parallel large workloads, and it is way more complex, and JFS has two uncommon but amazingly useful special features).
Sure it was very convenient to let people (in particular Red Hat customers) upgrade in place from 'ext' to 'ext2' to 'ext3' to 'ext4' (each in-place upgrade keeping existing files unchanged and usually with terrible performance), but given that when JFS was introduced the Linux base was growing rapidly, new installations could be expected to outnumber old ones very soon, making that point largely moot.
PS: There are other little known good filesystems, like OCFS2 (which is pretty good in non-clustered mode) and NILFS2 (probably going to be very useful on SSDs), but JFS is amazingly still very good. Reiser4 was also very promising (it seems little known that the main developer of BTRFS was also the main developer of Reiser4). As a pet peeve of mine UDF could have been very promising too, as it was quite well suited to RW media like hard disks too (and the Linux implementation almost worked in RW mode on an ordinary partition), and also to SSDs.
Posted Nov 30, 2011 22:07 UTC (Wed)
by yokem_55 (subscriber, #10498)
[Link]
Posted Nov 30, 2011 23:12 UTC (Wed)
by Lennie (subscriber, #49641)
[Link]
Posted Dec 1, 2011 0:53 UTC (Thu)
by SLi (subscriber, #53131)
[Link] (37 responses)
The only filesystem, years back, that could have said to outperform ext4 on most counts was ReiserFS 4. Unfortunately on each of the three times I stress tested it I hit different bugs that caused data loss.
Posted Dec 1, 2011 2:03 UTC (Thu)
by dlang (guest, #313)
[Link]
I haven't benchmarked against ext4, but I have done benchmarks with the filesystems prior to it, and I've run into many cases where JFS and XFS are clear winners.
even against ext4, if you have a fileserver situation where you have lots of drives involved, XFS is still likely to be a win, ext4 just doesn't have enough developers/testers with large numbers of disks to work with (this isn't my opinion, it's a statement from Ted Tso in response to someone pointing out where EXT4 doesn't do as well as XFS with a high performance disk array)
Posted Dec 2, 2011 18:52 UTC (Fri)
by walex (guest, #69836)
[Link] (35 responses)
JFS or XFS being the preferable filesystem on normal Linux use. Believe me, I've tried them both, benchmarked them both, and on almost all counts ext4 outperforms the two by a really wide margin (note that strictly speaking I'm not comparing the filesystems but their Linux implementations). In addition any failures have tended to be much worse on JFS and XFS than on ext4.
Posted Dec 2, 2011 23:15 UTC (Fri)
by tytso (subscriber, #9993)
[Link] (34 responses)
So benchmarking JFS against file systems that are engineered to be safe against power failures, such as ext4 and XFS, isn't particularly fair. You can disable cache flushes for both ext4 and XFS, but would you really want to run in an unsafe configuration for production servers? And JFS doesn't even have an option for enabling barrier support, so you can't make it run safely without fixing the file system code.
Posted Dec 3, 2011 0:56 UTC (Sat)
by walex (guest, #69836)
[Link] (31 responses)
As to JFS and performance and barriers with XFS and ext4:
Posted Dec 3, 2011 1:56 UTC (Sat)
by dlang (guest, #313)
[Link] (27 responses)
Posted Dec 3, 2011 3:06 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (26 responses)
Posted Dec 3, 2011 6:29 UTC (Sat)
by dlang (guest, #313)
[Link] (25 responses)
it should make barriers very fast so there isn't a big performance hit from leaving them on, but if you disable barriers and think the battery will save you, you are sadly mistaken
Posted Dec 3, 2011 11:05 UTC (Sat)
by nix (subscriber, #2304)
[Link] (24 responses)
If the power is out for months, civilization has probably fallen, and I'll have bigger things to care about than a bit of data loss. Similarly I don't care that battery backup doesn't defend me against people disconnecting the controller or pulling the battery while data is in transit. What other situation does battery backup not defend you against?
Posted Dec 3, 2011 15:39 UTC (Sat)
by dlang (guest, #313)
[Link] (15 responses)
1. writing from the OS to the raid card
2. writing from the raid card to the drives
battery backup on the raid card makes step 2 reliable. this means that if the data is written to the raid card it should be considered as safe as if it was on the actual drives (it's not quite that safe, but close enough)
However, without barriers, the data isn't sent from the OS to the raid card in any predictable pattern. It's sent at the whim of the OS cache flusing algorithm. This can result in some data making it to the raid controller and other data not making it to raid controller if you have an unclean shutdown. If the data is never sent to the raid controller, then the battery there can't do you any good.
With Barriers, the system can enforce that data gets to raid controller in a particular order, and so the only data that would be lost is the data since the last barrier operation was completed.
note that if you are using software raid, things are much uglier as the OS may have written the stripe to one drive and not to another (barriers only work on a single drive, not across drives). this is one of the places where hardware raid is significantly more robust than software raid.
Posted Dec 3, 2011 18:04 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (14 responses)
Posted Dec 3, 2011 19:31 UTC (Sat)
by dlang (guest, #313)
[Link] (11 responses)
barriers preserve the ordering of writes throughout the entire disk subsystem, so once the filesystem decides that a barrier needs to be at a particular place, going through a layer of LVM (before it supported barriers) would run the risk of the writes getting out of order
with barriers on software raid, the raid layer won't let the writes on a particular disk get out of order, but it doesn't enforce that all writes before the barrier on disk 1 get written before the writes after the barrier on disk 2
Posted Dec 4, 2011 6:17 UTC (Sun)
by raven667 (subscriber, #5198)
[Link] (10 responses)
In any event there is a bright line between how the kernel handles internal data structures and what the hardware does and for storage with battery backed write cache once an IO is posted to the storage it is as good as done so there is no need to ask the storage to commit its blocks in any particular fashion. The only issue is that the kernel issue the IO requests in a responsible manner.
Posted Dec 4, 2011 6:41 UTC (Sun)
by dlang (guest, #313)
[Link] (8 responses)
per the messages earlier in this thread, JFS does not, for a long time (even after it was the default in Fedora), LVM did not.
so barriers actually working correctly is relatively new (and very recently they have found more efficient ways to enforce ordering than the older version of barriers.
Posted Dec 4, 2011 11:24 UTC (Sun)
by tytso (subscriber, #9993)
[Link]
It shouldn't be that hard to add support, but no one is doing any development work on it.
Posted Dec 4, 2011 16:26 UTC (Sun)
by rahulsundaram (subscriber, #21946)
[Link] (6 responses)
Posted Dec 4, 2011 16:50 UTC (Sun)
by dlang (guest, #313)
[Link] (5 responses)
Fedora has actually been rather limited in it's support of various filesystems. The kernel supports the different filesystems, but the installer hasn't given you the option of using XFS and JFS for your main filsystem for example.
Posted Dec 4, 2011 17:41 UTC (Sun)
by rahulsundaram (subscriber, #21946)
[Link] (4 responses)
"JFS does not, for a long time (even after it was the default in Fedora)"
You are inaccurate about your claim on the installer as well. XFS is a standard option in Fedora for several releases ever since Red Hat hired Eric Sandeen from SGI to maintain it (and help develop Ext4). JFS is a non-standard option.
Posted Dec 4, 2011 19:22 UTC (Sun)
by dlang (guest, #313)
[Link] (3 responses)
re: XFS, I've been using linux since '94, so XFS support in the installer is very recent :-)
I haven't been using Fedora for quite a while, my experiance to RedHat distros is mostly RHEL (and CentOS), which lag behind. I believe that RHEL5 still didn't support XFS in the installer
Posted Dec 4, 2011 19:53 UTC (Sun)
by rahulsundaram (subscriber, #21946)
[Link]
http://fedoraproject.org/wiki/Releases/10/Beta/ReleaseNot...
That is early 2008. RHEL 6 has xfs support as a add-on subscription and is supported within the installer as well IIRC.
Posted Dec 5, 2011 16:15 UTC (Mon)
by wookey (guest, #5501)
[Link] (1 responses)
(I parsed it the way rahulsundaram did too - it's not clear).
Posted Dec 5, 2011 16:59 UTC (Mon)
by dlang (guest, #313)
[Link]
Posted Jan 30, 2012 8:50 UTC (Mon)
by sbergman27 (guest, #10767)
[Link]
Posted Dec 8, 2011 17:54 UTC (Thu)
by nye (subscriber, #51576)
[Link] (1 responses)
Surely what you're describing is a cache flush, not a barrier?
A barrier is intended to control the *order* in which two pieces of data are written, not when or even *if* they're written. A barrier *could* be implemented by issuing a cache flush in between writes (maybe this is what's commonly done in practice?) but in that case you're getting slightly more than you asked for (ie. you're getting durability of the first write), with a corresponding performance impact.
Posted Dec 8, 2011 23:24 UTC (Thu)
by raven667 (subscriber, #5198)
[Link]
Posted Dec 12, 2011 12:01 UTC (Mon)
by jlokier (guest, #52227)
[Link] (7 responses)
Some battery-backed disk write caches can commit the RAM to flash storage or something else, on battery power, in the event that the power supply is removed for a long time. These systems don't need a large battery and provide stronger long-term guarantees.
Even ignoring ext3's no barrier default, and LVM missing them for ages, there is the kernel I/O queue (elevator) which can reorder requests. If the filesystem issues barrier requests, the elevator will send writes to the storage device in the correct order. If you turn off barriers in the filesystem when mounting, the kernel elevator is free to send writes out of order; then after a system crash, the system recovery will find inconsistent data from the storage unit. This can happen even after a normal crash such as a kernel panic or hard-reboot, no power loss required.
Whether that can happen when you tell the filesystem not to bother with barriers depends on the filesystem's implementation. To be honest, I don't know how ext3/4, xfs, btrfs etc. behave in that case. I always use barriers :-)
Posted Dec 12, 2011 15:40 UTC (Mon)
by andresfreund (subscriber, #69562)
[Link] (6 responses)
Posted Dec 12, 2011 18:14 UTC (Mon)
by dlang (guest, #313)
[Link] (5 responses)
there is no modern filesystem that waits for the data to be written before proceeding. Every single filesystem out there will allow it's writes to be cached and actually written out later (in some cases, this can be _much_ later)
when the OS finally gets around to writing the data out, it has no idea what the application (or filesystem) cares about, unless there are barriers issued to tell the OS that 'these writes must happen before these other writes'
Posted Dec 12, 2011 18:15 UTC (Mon)
by andresfreund (subscriber, #69562)
[Link] (4 responses)
Posted Dec 12, 2011 18:39 UTC (Mon)
by dlang (guest, #313)
[Link] (3 responses)
it actually doesn't stop processing requests and wait for the confirmation from the disk, it issues a barrier to tell the rest of the storage stack not to reorder around that point and goes on to process the next requrest and get it in flight.
Posted Dec 12, 2011 18:53 UTC (Mon)
by andresfreund (subscriber, #69562)
[Link] (2 responses)
It worked a littlebit more like you describe before 2.6.37 but back then it waited if barriers were disabled.
Posted Dec 13, 2011 13:35 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Dec 13, 2011 13:38 UTC (Tue)
by andresfreund (subscriber, #69562)
[Link]
Posted Dec 3, 2011 11:00 UTC (Sat)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Dec 3, 2011 18:06 UTC (Sat)
by raven667 (subscriber, #5198)
[Link]
Posted Dec 3, 2011 20:33 UTC (Sat)
by tytso (subscriber, #9993)
[Link]
ext3 was first supported by RHEL as of RHEL 2 which was released May 2003 --- and as you can see from the dates above, we had developers working at a wide range of companies, thus making it a communuty-supported distribution, long before Red Hat supported ext3 in their RHEL product. In contrast, most of the reiserfs developers worked at Namesys (with a one or two exceptions, most notably Chris Mason when he was at SuSE), and most of the XFS developers worked at SGI.
Posted Dec 5, 2011 16:29 UTC (Mon)
by wookey (guest, #5501)
[Link] (1 responses)
When I managed to repair them I found that many files had big blocks of zeros in them - essentially anything that was in the journal and had not been written. Up to that point I had naively thought that the point of the journal was to keep actual data, not just filesystem metadata. Files that have been 'repaired' by being silently filled with big chunks of zeros did not impress me.
So I now believe that XFS is/was good, but only on properly UPSed servers. Am I wrong about that?
Posted Dec 5, 2011 17:03 UTC (Mon)
by dlang (guest, #313)
[Link]
XFS caches more stuff than ext does, so a crash looses more stuff.
so XFS or ext* with barriers disabled is not good to use, For a long time, running these things on top of LVM had the side effect of disabling barriers, it's only recently that LVM gained the ability to support them
JFS is not good to use (as it doesn't have barriers at all)
note that when XFS is designed to be safe, that doesn't mean that it won't loose data, just that the metadata will not be corrupt.
the only way to not loose data in a crash/power failure is to do no buffering at all, and that will absolutely kill your performance (and we are talking hundreds of times slower, not just a few percentage points)
Posted Dec 1, 2011 2:58 UTC (Thu)
by tytso (subscriber, #9993)
[Link] (2 responses)
JFS was a very good file system, and at the time when it was released, it certainly was better than ext3. But there's a lot more to having a successful open source project beyond having the best technology. The fact that ext2 was well understood, and had a mature set of file system utilities, including tools like "debugfs", are one of the things that do make a huge difference towards people accepting the technology.
At this point, though, ext4 has a number of features which JFS lacks, including delayed allocation, fallocate, punch, and TRIM/discard support. These are all features which I'm sure JFS would have developed if it still had a development community, but when IBM decided to defund the project, there were few or no developers who were not IBM'ers, and so the project stalled out.
---
People who upgrade in place from ext3 to ext4 will see roughly half the performance increase compared to doing a backup, reformat to ext4, and restore operation. But they *do* see a performance increase if they do an upgrade-in-place operation. In fact, even if they don't upgrade the file system image, and use ext4 to mount an ext2 file system image, they will see some performance improvement. So this gives them flexibility, which from a system administrator's point of view, is very, very important!
---
Finally, I find it interesting that you consider OCFS2 "pretty good" in non-clustered mode. OCFS2 is a fork of the ext3 code base[1] (it even uses fs/jbd and now fs/jbd2) with support added for clustered operation, and with support for extents (which ext4 has as well, of course). It doesn't have delayed allocation. But ext4 will be better than ocfs2 in non-clustered mode, simply because it's been optimized for it. The fact that you seem to think OCFS2 to be "pretty good", while you don't seem to think much about ext4 makes me wondered if you have some pretty strong biases against the ext[234] file system family.
[1] Ocfs2progs is also a fork of e2fsprogs. Which they did with my blessing, BTW. I'm glad to see that the code that has come out of the ext[234] project have been useful in so many places. Heck, parts of the e2fsprogs (the UUID library, which I relicensed to BSD for Apple's benefit) can be found in Mac OS X! :-)
Posted Dec 1, 2011 20:25 UTC (Thu)
by sniper (guest, #13219)
[Link] (1 responses)
ocfs2 is not a fork of ext3 and neither is ocfs2-tools a fork of e2fsprogs. But both have benefited a _lot_ from ext3. In some instances, we copied code (non-indexed dir layout). In some instances, we used a different approach because of collective experience (indexed dir). grep ext3 fs/ocfs2/* for more.
The toolset has a lot more similarities to e2fsprogs. It was modeled after it because it is well designed and to also allow admins to quickly learn it. The tools even use the same parameter names where possible. grep -r e2fsprogs * for more.
BTW, ocfs2 has had bigalloc (aka clusters) since day 1, inline-data since 2.6.24 and metadata checksums since 2.6.29. Yes, it does not have delayed allocations.
Posted Apr 13, 2012 19:30 UTC (Fri)
by fragmede (guest, #50925)
[Link]
LVM snapshots are a joke if you have *lots* of snapshots, though I haven't looked at btrfs snapshots since it became production ready.
Posted Dec 1, 2011 3:22 UTC (Thu)
by tytso (subscriber, #9993)
[Link]
At the time when I started working on ext4, XFS developers were all mostly still working for SGI, so there was a similar problem with the distributions not having anyone who could support or debugfs XFS problems. This has changed more recently, as more and more XFS developers have left (volunteraliy or involuntarily) SGI and joined companies such as Red Hat. XFS has also improved its small file performance, which was something it didn't do particularly well simply because SGI didn't optimize for that; its sweet spot was and still is really large files on huge RAID arrays.
One of the reasons why I felt it was necessary to work on ext4 was that everyone I talked to who had created a file system before in the industry, whether it was GPFS (IBM's cluster file system), or Digital Unix's advfs, or Sun's ZFS, gave estimates of somewhere between 50 to 200 person years worth of effort before the file system was "ready". Even if we assume that open source development practices would make development go twice as fast, and if we ignore the high end of the range because cluster file systems are hard, I was skeptical it would get done in two years (which was the original estimate) given the number of developers it was likely to attract. Given that btrfs started at the beginning of 2007, and here we are almost at 2012, I'd say my fears were justified.
At this point, I'm actually finding that ext4 has found a second life as a server file system in large cloud data centers. It turns out that if you don't need the fancy-shamcy features that Copy-on-Write file systems give you, they aren't free. In particular, ZFS has truly a prodigious appetite for memory, and one of the things about cloud servers is that in order for them to make economic sense, you try to pack as many jobs or VM's on them, so they are constantly under memory pressure. We've done some further optimizations so that ext4 performs much better when under memory pressure, and I suspect at this point that in a cloud setting, using a CoW file system may simply not make sense.
Once btrfs is ready for some serious benchmarking, it would be interesting to benchmark it under serious memory pressure, and see how well it performs. Previous CoW file systems, such as BSD's lfs two decades ago, and ZFS more recently, have needed a lot of memory to cache metadata blocks, and it will be interesting to see if btrfs has similar issues.
Posted Dec 1, 2011 19:36 UTC (Thu)
by nix (subscriber, #2304)
[Link] (13 responses)
I also see that I was making some sort of horrible mistake by installing ext4 on all my newer systems, but you never make clear what that mistake might have been.
I've been wracking my brains and I can't think of one thing Greg has done that has come to public knowledge and could be considered bad. So this looks like groundless personal animosity to me.
Posted Dec 1, 2011 19:41 UTC (Thu)
by andresfreund (subscriber, #69562)
[Link]
Posted Dec 2, 2011 11:35 UTC (Fri)
by alankila (guest, #47141)
[Link] (5 responses)
Posted Dec 2, 2011 18:40 UTC (Fri)
by nix (subscriber, #2304)
[Link]
(Yes, I read the release notes, so didn't fall into these traps, but FFS, at least the latter problem was trivial to work around -- one line in the makefile to drop a symlink in /sbin -- and they just didn't bother.)
Posted Dec 2, 2011 23:40 UTC (Fri)
by walex (guest, #69836)
[Link] (3 responses)
As to udev some people dislike smarmy shysters who replace well designed working subsystems seemingly for the sole reason of making a political landgrab, because the replacement has both more kernel complexity and more userland complexity and less stability. The key features of devfs were that it would populate automatically /dev from the kernel with basic device files (major, minor) and then use a very simple userland daemon to add extra aliases as required. It turns out that after several attempts to get it to work udev adds to /sys from inside the kernel exactly the same information, so there has been no migration of functionality from kernel to userspace: And the userland part is also far more complex and unstable than devfsd ever was (for example devfs did not require cold start). And udev is just the most shining example of a series of similar poor decisions (which however seem to have been improving a bit with time).
Posted Dec 3, 2011 3:16 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (1 responses)
Posted Dec 3, 2011 11:07 UTC (Sat)
by nix (subscriber, #2304)
[Link]
Posted Dec 3, 2011 4:04 UTC (Sat)
by alankila (guest, #47141)
[Link]
Posted Dec 3, 2011 0:12 UTC (Sat)
by walex (guest, #69836)
[Link] (5 responses)
«tytso wasn't working for RH when ext4 started up, and still isn't working for them now. So their influence must be more subtle. » Quite irrelevant: a lot of file system were somebody's hobby file systems, but they did not achieve prominence and instant integration into mainline even if rather alpha, and RedHat did not spend enormous amounts of resources quality assuring them to make them production ready either, and quality assurance is a pretty vital detail for file systems, as the Namesys people discovered. Pointing to tytso is just misleading. Also because ext4 really was seeded by Lustre people before tytso became active on it in his role as ext3 curator (and in 2005, which is 5 years later than when JFS became available). Similarly for BTRFS, it has been initiated by Oracle (who have an ext3 installed base), but its main appeal is still as the next inplace upgrade on the Red Hat installed base (thus the interest in trialing it in Fedora, where EL candidate stuff is mass tested), even if for once it is not just an extension of the ext line but has some interesting new angles. But considering ext4 on its own is a partial view; one must consider the pre-existing JFS and XFS stability and robustness and performance, and from a technical point of view ext4 is not that interesting (euphemism) and its sole appeal is inplace upgrades, and the widest installed based for that is RedHat, and to a large extent that could have been said of ext3 too.
Posted Dec 3, 2011 0:52 UTC (Sat)
by nix (subscriber, #2304)
[Link] (1 responses)
And if you're claiming that btrfs is effectively RH-controlled merely because RH customers will benefit, then *everything* that happens to Linux must by your bizarre definition be RH-controlled. That's a hell of a conspiracy: so vague that the coconspirators don't even realise they're conspiring!
Posted Apr 13, 2012 19:34 UTC (Fri)
by fragmede (guest, #50925)
[Link]
Posted Dec 3, 2011 19:45 UTC (Sat)
by tytso (subscriber, #9993)
[Link] (2 responses)
But you can't have it both ways. If that code had been in use by paying Lustre companies, then it's hardly alpha code, wouldn't you agree?
And why did the Lustre developers at Clustrefs chose ext3? Because the engineers they hired knew ext3, since it was a community-supported distribution, whereas JFS was controlled by a core team that was all IBM'ers, and hardly anyone outside of IBM was available who knew JFS really well.
But as others have already pointed out, there was no grand conspiracy to pick ext2/3/4 over its competition. It won partially due to its installed base, and partially because of the availability of developers who understood it (and books written about it, etc., etc., etc.) The way you've been writing you seem to think there was some secret cabal (at Red Hat?) that made these decisions, and there was a "mistake" because they didn't chose your favorite file systems.
The reality is that file systems all have trade-offs, and what's good for some people are not so great for others. Take a look at some of the benchmarks at btrfs.boxacle.net; they're a bit old, but they are well done, and they show that across many different workloads at that time (2-3 years ago) there was no one single file system that was the best across all of the different workloads. So anyone who only uses a single workload, or a single hardware configuration, and tries to use that to prove that their favorite file system is the "best" is trying to sell you something, or who is a slashdot kiddie who has a fan-favorite file system. The reality is a lot more complicated than that, and it's not just about performance. (Truth be told, for many/most uses cases, the file system is not the bottleneck.) Issues like availability of engineers to support the file system in a commercial product, the maturity of the userspace support tools, ease of maintainability, etc. are at least as important if not more so.
Posted Dec 3, 2011 20:43 UTC (Sat)
by dlang (guest, #313)
[Link] (1 responses)
Add to this the fact that you did not need to reformat your system to use ext3 when upgrading, and the fact that ext3 became the standard (taking over from ext2, which was the prior standard) is a no-brainer, and no conspiracy.
In those days XFS would outperform ext3, but only in benchmarks on massive disk arrays (which were even more out of people's price ranges at that point then they are today)
XFS was scalable to high-end systems, but it's low-end performance was mediocre
looking at things nowdays, XFS has had a lot of continuous improvement and integration, both improving it's high-end performance and reliability, and improving it's low-end performance without loosing it's scalability. There are also more people, working for more companies supporting it, making it far less of a risk today, with far more in the way of upsides.
JFS has received very little attention after the initial code dump from IBM, and there is now nobody actively maintaining/improving it, so it really isn't a good choice going forward.
reiserfs had some interesting features and performance, but it suffered from some seriously questionably benchmarking (the one that turned me off to it entirely was a spectacular benchmarking test that reiserfs completed in 20 seconds that took several minutes on ext*, but then we discovered that reiserfs defaulted to a 30 second delay before writing everything to disk, so the entire benchmark was complete before any day started getting written to disk, after that I didn't trust anything that they claimed), and a few major problems (the fsck scrambling is a huge one). It was then abandoned by the developer in favor of the future reiserfs4, with improvements that were submitted being rejected as they were going to be part of the new, incompatible filesystem.
ext4 is in large part a new filesystem who's name just happens to be similar to what people are running, but it has now been out for several years, with developers who are responsive to issues, are a diverse set (no vendor lock-in or dependencies) and are willing to say where the filesystem is not the best choice.
btrfs is still under development (the fact that they don't yet have a fsck tool is telling), is making claims that seem too good to be true, and have already run into several cases where they have pathalogical behavior and have had to modify things significantly. I wouldn't trust it for anything other than non-critical personal use for another several years.
as a result, I am currently using XFS for the most part, but once I get a chance to do another round of testing, ext4 will probably join it. I have a number of systems that have significant numbers of disks, so XFS will probably remain in use.
Posted Dec 4, 2011 1:12 UTC (Sun)
by nix (subscriber, #2304)
[Link]
Posted Dec 1, 2011 3:46 UTC (Thu)
by eli (guest, #11265)
[Link] (1 responses)
Posted Dec 6, 2011 1:06 UTC (Tue)
by pr1268 (guest, #24648)
[Link]
Thanks for the suggestion; I'll give it a try sometime if/when I find a corrupt OGG file. Just to bring some closure to this discussion, I wish to make a few points: Many thanks to everyone's discussion above; I always learn a lot from the comments here on LWN.
Posted Dec 8, 2011 15:34 UTC (Thu)
by lopgok (guest, #43164)
[Link] (8 responses)
I wrote it when I had a serverworks chipset on my motherboard that corrupted IDE hard drives when DMA was enabled. However, the utility lets me know there is no bit rot in my files.
It can be found at http://jdeifik.com/ , look for 'md5sum a directory tree'. It is GPL3 code. It works independently from the files being checksummed and independently of the file system. I have found flaky disks that passed every other test with this utility.
The other thing that can corrupt files is memory errors. Many new computers do not support ECC memory. If you care about data integrity, you should use ECC memory. Intel has this feature for their server chips (xeons) and AMD has this feature for all ofgf their processors (though not all motherboard makers support it).
Posted Dec 8, 2011 16:24 UTC (Thu)
by nix (subscriber, #2304)
[Link] (7 responses)
ECCRAM is worthwhile, but it is not at all cheap once you factor all that in.
Posted Dec 8, 2011 17:47 UTC (Thu)
by tytso (subscriber, #9993)
[Link] (6 responses)
It's like people who balk at spending an extra $200 to mirror their data, or to provide a hot spare for their RAID array. How much would you be willing to spend to get back your data after you discover it's been vaporized? What kind of chances are you willing to take against that eventuality happen?
It will vary depending on each person, but traditional people are terrible and figuring out cost/benefit tradeoffs.
Posted Dec 8, 2011 19:10 UTC (Thu)
by nix (subscriber, #2304)
[Link] (5 responses)
(Also, last time I tried you couldn't buy a desktop with ECCRAM for love nor money. Servers, sure, but not desktops. So of course all my work stays on the server with battery-backed hardware RAID and ECCRAM, and I just have to hope the desktop doesn't corrupt it in transit.)
Posted Dec 9, 2011 0:57 UTC (Fri)
by tytso (subscriber, #9993)
[Link] (2 responses)
I really like how quickly I can build kernels on this machine. :-)
I'll grant it's not "cheap" in absolute terms, but I've always believed that skimping on a craftsman's tools is false economy.
Posted Dec 9, 2011 7:41 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link]
I have the same machine. Oddly enough, it only supports 12GB of non-ECC memory, at least according to Dell's manual. How does that happen?
(Also, Intel's processor datasheet claims that several hundred gigabytes of either ECC or non-ECC memory should be supported using the integrated memory controller. I wonder why Dell's system supports less.)
Posted Dec 9, 2011 12:40 UTC (Fri)
by nix (subscriber, #2304)
[Link]
EDAC support for my Nehalem systems landed in mainline a couple of years ago but I'll admit to never having looked into how to get it to tell me what errors may have been corrected, so I have no idea how frequent they might be.
(And if it didn't mean dealing with Dell I might consider one of those machines myself...)
Posted Dec 9, 2011 13:53 UTC (Fri)
by james (subscriber, #1325)
[Link] (1 responses)
Even ECC memory isn't that much more expensive: Crucial do a 2x2GB ECC kit for £27 + VAT ($42 in the US) against £19 ($30).
Posted Dec 9, 2011 15:19 UTC (Fri)
by lopgok (guest, #43164)
[Link]
If you buy assembled computers and can't get ECC support without spending big bucks, it is time to switch vendors.
It is true that ECC memory is more expensive and less available than non-ECC memory, but the price difference is around 20% or so, and Newegg and others sell a wide variety of ECC memory. Mainstream memory manufacturers, including Kingston sell ECC memory.
Of course, virtually all server computers come with ECC memory.
Posted Jan 15, 2012 3:45 UTC (Sun)
by sbergman27 (guest, #10767)
[Link]
The very first time we had an power failure, with a UPS with a bad battery, we experienced corruption in several files of those files. Never *ever* *ever* had we experienced such a thing with EXT3. I immediately added nodelalloc as a mount option, and the EXT4 filesystem now seems as resilient as EXT3 ever was. Note that at around the same time as 2.6.30, EXT3 was made less reliable by adding the same 2.6.30 patches to it, and making data=writeback the default journalling mode. So if you do move back to EXT3, make sure to mount with data=journal.
BTW, I've not noted any performance differences mounting EXT4 with nodelalloc. Maybe in a side by side benchmark comparison I'd detect something.
Posted Feb 19, 2013 10:23 UTC (Tue)
by Cato (guest, #7643)
[Link]
You might also like to try ZFS or btrfs - both have enough built-in checksumming that they should detect issues sooner, though in this case Ogg's checksumming is doing that for audio files. With a checksumming FS you could detect whether the corruption is in RAM (seen when writing to file) or on disk (seen when reading from file). ZFS also does periodic scrubbing to validate checksums.
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
A few wrong bits in a Vorbis stream seem likely to give you more than just "one wrong sample".
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
shotgun debugging
What you're trying to do with moving back to ext3 is what the Jargon File calls shotgun debugging: trying out some radical move in hopes that this will fix your problem.
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
ext4 experience
ext4 experience
ext4 experience
ext4 experience
ext4 experience
ext4 experience
Improving ext4: bigalloc, inline data, and metadata checksums
bitflips
bitflips
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Pretty far off-topic, but: it is a rare situation indeed where the removal of information will improve the fidelity of a signal. One might not be able to hear the difference, but I have a hard time imagining how conversion between lossy formats could do anything but degrade the quality. You can't put back something that the first lossy encoding took out, but you can certainly remove parts of the signal that the first encoding preserved.
Lossy format conversion
Lossy format conversion
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Most well done benchmarks I have seen show them mostly equivalent performance, with XFS leading the group in scalability, JFS pretty good across the field, and 'ext4' just like the previous 'ext's being good only on totally freshly loaded filesystems as it packs newly created files pretty densely, and when there is ample caching (no use of 'O_DIRECT'), and both fresh loading and caching mask its fundamental, BSD FFS derived, downsides.
It is very very easy to do meaningless filesystem benchmarks (the vast majority that I see on LWN and most others are worthless).
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
"..., for a long time (even after it was the default in Fedora), LVM did not"
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
I am rather sure at least ext4 and xfs do it that way.
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
jbd does something similar but I don't want to look it up unless youre really interested.
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
I shouldn't respond to this troll-bait, but nonetheless...
Improving ext4: bigalloc, inline data, and metadata checksums
The big problem with 'ext4' is that its only reason to be is to allow Red Hat customers to upgrade in place existing systems, and what Red Hat wants, Red Hat gets (also because they usually pay for that and the community is very grateful).
Interesting. tytso wasn't working for RH when ext4 started up, and still isn't working for them now. So their influence must be more subtle.
In particular JFS should have been the "default" Linux filesystem instead of ext[23] for a long time. Not making JFS the default was probably the single worst strategic decision for Linux (but it can be argued that letting GKH near the kernel was even worse).
Ah, yeah. Because stable kernels, USB support, mentoring newbies, the driver core, -staging... all these things were bad.
Improving ext4: bigalloc, inline data, and metadata checksums
Also, uhm. Didn't he work for Suse?
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
$ ls -ld /dev/tty9
crw--w---- 1 root tty 4, 9 2011-11-28 14:03 /dev/tty9
$ cat /sys/class/tty/tty9/dev
4:9
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
ext4 is in large part a new filesystem who's name just happens to be similar to what people are running
ext4 is ext3 with a bunch of new extensions (some incompatible): indeed, initially the ext4 work was going to be done to ext3, until Linus asked for it to be done in a newly-named clone of the code instead. It says a lot for the ext2 code and disk formats that they've been evolvable to this degree.
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
I wrote a trivial python script to generate a checksum file for each directory's files. If you run it, and it finds a checksum file, it checks that the files in the directory match the checksum file, and if they don't it reports that.
It is very cheap insurance.
Improving ext4: bigalloc, inline data, and metadata checksums
It is very cheap insurance.
Look at the price differential between the motherboards and CPUs that support ECCRAM and those that do not. Now add in the extra cost of the RAM.
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums
Improving ext4: bigalloc, inline data, and metadata checksums