|
|
Subscribe / Log in / New account

Improving ext4: bigalloc, inline data, and metadata checksums

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 8, 2011 15:34 UTC (Thu) by lopgok (guest, #43164)
In reply to: Improving ext4: bigalloc, inline data, and metadata checksums by pr1268
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

You should generate a checksum for each file in your filesystem.
I wrote a trivial python script to generate a checksum file for each directory's files. If you run it, and it finds a checksum file, it checks that the files in the directory match the checksum file, and if they don't it reports that.

I wrote it when I had a serverworks chipset on my motherboard that corrupted IDE hard drives when DMA was enabled. However, the utility lets me know there is no bit rot in my files.

It can be found at http://jdeifik.com/ , look for 'md5sum a directory tree'. It is GPL3 code. It works independently from the files being checksummed and independently of the file system. I have found flaky disks that passed every other test with this utility.

The other thing that can corrupt files is memory errors. Many new computers do not support ECC memory. If you care about data integrity, you should use ECC memory. Intel has this feature for their server chips (xeons) and AMD has this feature for all ofgf their processors (though not all motherboard makers support it).
It is very cheap insurance.


to post comments

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 8, 2011 16:24 UTC (Thu) by nix (subscriber, #2304) [Link] (7 responses)

It is very cheap insurance.
Look at the price differential between the motherboards and CPUs that support ECCRAM and those that do not. Now add in the extra cost of the RAM.

ECCRAM is worthwhile, but it is not at all cheap once you factor all that in.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 8, 2011 17:47 UTC (Thu) by tytso (subscriber, #9993) [Link] (6 responses)

Whether or not it is cheap or not depends on how much you value your data.

It's like people who balk at spending an extra $200 to mirror their data, or to provide a hot spare for their RAID array. How much would you be willing to spend to get back your data after you discover it's been vaporized? What kind of chances are you willing to take against that eventuality happen?

It will vary depending on each person, but traditional people are terrible and figuring out cost/benefit tradeoffs.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 8, 2011 19:10 UTC (Thu) by nix (subscriber, #2304) [Link] (5 responses)

Yep. That's why I said it was worthwhile. But 'very cheap'? Not unless 'cheap' means 'costs much more money than other alternatives'. Yes, it has benefits, but immediate financial return is not one of them.

(Also, last time I tried you couldn't buy a desktop with ECCRAM for love nor money. Servers, sure, but not desktops. So of course all my work stays on the server with battery-backed hardware RAID and ECCRAM, and I just have to hope the desktop doesn't corrupt it in transit.)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 9, 2011 0:57 UTC (Fri) by tytso (subscriber, #9993) [Link] (2 responses)

What I have under my desk at work (and I'm quite happy with it) is the Dell T3500 Precision Workstation, which supports up to 24GB of ECC or non-ECC memory. It's not a mini-ATX desktop, but it's definitely not a server, either.

I really like how quickly I can build kernels on this machine. :-)

I'll grant it's not "cheap" in absolute terms, but I've always believed that skimping on a craftsman's tools is false economy.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 9, 2011 7:41 UTC (Fri) by quotemstr (subscriber, #45331) [Link]

> Dell T3500 Precision Workstation, which supports up to 24GB of ECC or non-ECC memory.

I have the same machine. Oddly enough, it only supports 12GB of non-ECC memory, at least according to Dell's manual. How does that happen?

(Also, Intel's processor datasheet claims that several hundred gigabytes of either ECC or non-ECC memory should be supported using the integrated memory controller. I wonder why Dell's system supports less.)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 9, 2011 12:40 UTC (Fri) by nix (subscriber, #2304) [Link]

Oh, agreed. I've seen multiple rounds of friends deciding to save money on a cheap PC, trying to do real work on it, and finding the result a crashy erratic data-corrupting horror that is almost impossible to debug unless you have a second identical machine to swap parts out of... and losing years of working time to these unreliable nightmares. I pay a bit more (well, OK, quite a lot more) and those problems simply don't happen. I don't think this is ECCRAM, though: I think it's simply a matter of tested components with a decent safety margin rather than bargain-basement junk.

EDAC support for my Nehalem systems landed in mainline a couple of years ago but I'll admit to never having looked into how to get it to tell me what errors may have been corrected, so I have no idea how frequent they might be.

(And if it didn't mean dealing with Dell I might consider one of those machines myself...)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 9, 2011 13:53 UTC (Fri) by james (subscriber, #1325) [Link] (1 responses)

AMD processors since the Athlon 64 all support ECC, and most Asus AMD boards (even cheap ones) wire the lines up.

Even ECC memory isn't that much more expensive: Crucial do a 2x2GB ECC kit for £27 + VAT ($42 in the US) against £19 ($30).

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 9, 2011 15:19 UTC (Fri) by lopgok (guest, #43164) [Link]

I agree. The last 3 motherboards I have bought were for AMD processors. I bought a 3 core phenom II, an asus motherboard, and 4gb of ECC ram for around $200. I have no idea why Intel only supports ECC on their server motherboards. For me, this is a critical feature. In my experience, many Gigabyte motherboards do not support ECC, so check the motherboard manual, or list of supported memory before buying. In fact AMD supports IBM's Chipkill technology which will detect 4 bit errors and correct 3 bit errors. In addition, my Asus motherboards support memory scrubbing, which can help detect memory errors in a timely fashion.

If you buy assembled computers and can't get ECC support without spending big bucks, it is time to switch vendors.

It is true that ECC memory is more expensive and less available than non-ECC memory, but the price difference is around 20% or so, and Newegg and others sell a wide variety of ECC memory. Mainstream memory manufacturers, including Kingston sell ECC memory.

Of course, virtually all server computers come with ECC memory.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds