LWN.net Logo

Tux3: the other next-generation filesystem

Tux3: the other next-generation filesystem

Posted Dec 16, 2008 1:42 UTC (Tue) by daniel (subscriber, #3181)
In reply to: Tux3: the other next-generation filesystem by njs
Parent article: Tux3: the other next-generation filesystem

I've only lived with maybe a few dozen disks in my life, but I've still corruption like that too -- in this case, it turned out that the disk was fine, but one of the connections on the RAID card was bad, and was silently flipping single bits on reads that went to that disk (so it was nondeterministic, depending on which mirror got hit on any given cache fill, and quietly persisted even after the usual fix of replacing the disk).

Luckily the box happened to be hosting a modern DVCS server (the first, in fact), which was doing its own strong validation on everything it read from the disk, and started complaining very loudly. No saying how much stuff on this (busy, multi-user, shared) machine would have gotten corrupted before someone noticed otherwise, though... and backups are no help, either.


Our ddnap-style checksumming at replication time would have caught that corruption promptly.

if there comes a day when there are two great filesystems and one is a little slower but has checksumming, I'm choosing the checksumming one. Saving milliseconds (of computer time) is not worth losing years (of work).

It is not milliseconds, it is a significant fraction of your CPU, no matter how powerful. But yes, if you want extra checking is important to you, should be able to have it. Whether block checksums belong in the filesystem rather than volume manager is another question. There may be a powerful efficiency argument that checksumming has to be done by the filesystem, not the volume manager. If so, I would like to see it.

Anyway, when the time comes that block checksumming rises to the top of the list of things to do, we will make sure Tux3 has something respectable, one way or another. Note that checksumming at replication time already gets nearly all the benefit at a very modest CPU cost.

If you want to rank the relative importance of features, replication way beats checksumming. It takes you instantly from having no backup or really awful backup, to having great backup with error detection. So getting to that state with minimal distractions seems like an awfully good idea.


(Log in to post comments)

Tux3: the other next-generation filesystem

Posted Dec 21, 2008 12:26 UTC (Sun) by njs (guest, #40338) [Link]

> Our ddnap-style checksumming at replication time would have caught that corruption promptly.

What is that, and how does it work? I'm curious...

In general, I don't see how replication can help in the situation I encountered -- basically, some data on the disk magically changed without OS intervention. The only way to distinguish between that and a real data change is if you are somehow hooked into the OS and watching the writes it issues. Maybe ddsnap does that?

>It is not milliseconds, it is a significant fraction of your CPU, no matter how powerful.

Can you elaborate? On my year-old laptop, crc32 over 4k-blocks does >625 MiB/s on one core (adler32 is faster still), and the disk with perfect streaming manages to write at ~60 MiB/s, so by my calculation the worst case is 5% CPU. Enough that it could matter occasionally, but in fact seek-free workloads are very rare... and CPUs continue to follow Moore's law (checksumming is parallelizable), so it seems to me that that number will be <1% by the time tux3 is in production :-).

No opinion on volume manager vs. filesystem (so long as the interface doesn't devolve into distinct camps of developers pushing responsibilities off on each other); I could imagine there being locality benefits if your merkle tree follows the filesystem topology, but eh.

>If you want to rank the relative importance of features, replication way beats checksumming.

Fair enough, but I'll just observe that since I do have a perfectly adequate backup system in place already, replication doesn't get *me* anything extra, while checksumming does :-).

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds