|
|
Subscribe / Log in / New account

Tux3: the other next-generation filesystem

Tux3: the other next-generation filesystem

Posted Dec 5, 2008 23:58 UTC (Fri) by njs (subscriber, #40338)
In reply to: Tux3: the other next-generation filesystem by ncm
Parent article: Tux3: the other next-generation filesystem

>Checksumming only the file system's metadata and log, but not the user-level data, is a reasonable compromise

Well, maybe...

Within reason, my goal is to have a much confidence as possible in my data's safety, with as little investment of my time and attention. Leaving safety up to individual apps is a pretty wretched system for achieving this -- it defaults to "unsafe", then I have to manually figure out which stuff needs more guarantees, which I'll screw up, plus I have to worry about all the bugs that may exist in the eleventeen different checksumming systems being used in different codebases... This is the same reason I do whole disk backups instead of trying to pick and choose which files to save, or leaving backup functionality up to each individual app. (Not as crazy as an idea as it sounds -- that DVCS basically has its own backup system, for instance; but I'm not going around adding that functionality to my photo editor and word processor too.)

Obviously if checksumming ends up causing unacceptable slowdowns, then compromises have to be made. But I'm pretty skeptical; it's not like CRC (or even SHA-1) is expensive compared to disk access latency, and the Btrfs and ZFS folks seem to think usable full disk checksumming is possible.

If it's possible I want it.


to post comments

Tux3: the other next-generation filesystem

Posted Dec 6, 2008 8:26 UTC (Sat) by ncm (guest, #165) [Link] (2 responses)

This is another case where the end-to-end argument applies. Either (a) it's a non-critical application, and backups (which you have to do anyway) provide enough reliability; or (b) it's a critical application, and the file system can't provide enough assurance anyway, and what it could do would interfere with overall performance.

Similarly, if your application is seek-bound, it's in trouble anyway. If performance matters, it should be limited by the sustained streaming capacity of the file system, and then delays from redundant checksum operations really do hurt.

Hence the argument for reliable metadata, anyway: the application can't do that for itself, and it had better not depend on metadata operations being especially fast. Traditionally, serious databases used raw block devices to avoid depending on file system metadata.

Tux3: the other next-generation filesystem

Posted Dec 6, 2008 8:55 UTC (Sat) by njs (subscriber, #40338) [Link] (1 responses)

End-to-end is great, and it absolutely makes sense that special purpose systems like databases may want both additional guarantees and low-overhead access to the drive. But basically none of my important data is in a database; it's scattered all over my hard drive in ordinary files, in a dozen or more formats. If the filesystem *is* your database, as it is for ordinary desktop storage, then that's the only place you can reasonably put your integrity checking.

Backups are also great, but there are cases (slow quiet unreported corruption that can easily persist undetected for weeks+, see upthread) where they do not protect you.

(In some cases you can actually increase integrity too -- if your app checks its checksum when loading a file and it fails, then the data is lost but at least you know it; if btrfs checks a checksum while loading a block and it fails, then it can go pull an uncorrupted copy from the RAID mirror and prevent the data from being lost at all.)

>If performance matters, it should be limited by the sustained streaming capacity of the file system, and then delays from redundant checksum operations really do hurt.

Again, I'm not convinced. My year-old laptop does SHA-1 at 200 MB/s (using one core only); the fastest hard-drive in the world (according to storagereview.com) streams at 135 MB/s. Not that you want to devote a CPU to this sort of thing, and RAID arrays can stream faster than a single disk, but CRC32 goes *way* faster than SHA-1 too, and my laptop has neither RAID nor a fancy 15k RPM server drive anyway.

And anyway my desktop is often seek-bound, alas, and yours is too; it does make things slow, but I don't see why it should make me care less about my data.

Tux3: the other next-generation filesystem

Posted Dec 7, 2008 21:33 UTC (Sun) by ncm (guest, #165) [Link]

For most uses we would benefit from the file system doing as much as it can, and even backing itself up -- although we'd like to be able to bypass whatever gets in the way. But if the file system does less, at first, the first thing to checksum is the metadata.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds