LWN: Comments on "Distributed storage"

Distributed storage

renox — Thu, 30 Aug 2007 18:13:33 +0000

>> Evgeniy has said that the Reed-Solomon codes used for traditional RAID are not fast enough for distributed arrays. He suggests that WEAVER codes might be used instead.<<

Funny that, I just read a paper from Mr risk at google who had a file corruption because of a bad switch and the corruption went undectected because of a double bit error which went through the weaver codes, so he advocates CRC instead.

Distributed storage

JohnNilsson — Sun, 26 Aug 2007 17:28:09 +0000

Does anyone know what happend to ddraid? [1]

[1] http://sourceware.org/cluster/ddraid/

Lack of data integrity checks

giraffedata — Sat, 25 Aug 2007 19:29:48 +0000

I'm unclear on what corruptions you're concerned about. When people say "end to end," they're pointing to the fact that something could get corrupted before or after some other integrity check is done. Are you saying there's a significant risk that the data gets corrupted inside a router (outside of Ethernet integrity checks) or inside the client or server network stack (outside of UDP integrity checks)? Are we talking about OS bugs?

Just wondering, because while all kinds of failures are possible, it wouldn't make sense to protect against some risk that we routinely accept in other areas.

You also mention the UDP checksum as simply being too weak. If that's the problem, then I would just refer to "additional integrity checks" rather than emphasize "end to end."

Lack of data integrity checks

zdzichu — Fri, 24 Aug 2007 21:01:51 +0000

That what IPSec is for. AH or ESP without encryption (hash only) will catch errors missed by TCP/UDP checksums.

Lack of data integrity checks

brouhaha — Fri, 24 Aug 2007 16:30:16 +0000

The issue isn't whether a 32-bit CRC is good enough to protect a packet. For maximum length normal Ethernet frames, I would claim that it is good enough. We're trying to detect errors here, not to make it secure against deliberate alteration. If you need to protect against an adversary that may introduce deliberate alterations in your data, you need crytography.

The issue for error detection is that the Ethernet FCS only applies for one hop of a route, and gets recomputed by each router along the way. Thus it does not offer end-to-end protection. The packet will have opportunities to be corrupted between hops, and the node that the packet finally arrives at can only trust the FCS to mean that it wasn't corrupted on the wire since leaving the last router.

A UDP checksum is both better and worse. It's better in that it is end-to-end, but it's far worse in that a 16 bit checksum is very weak in its error detection probability compared to a 32-bit CRC. Part of the weakness is the 16-bit size, but part of it is the nature of a checksum.

I'm not arguing that the integrity checking should be done at the application layer. Although there are certainly applications that should do that, what I'm arguing for is that the remote block device client and server code need to do end-to-end error checking at their own level in the protocol stack.

Lack of data integrity checks

intgr — Fri, 24 Aug 2007 08:46:25 +0000

One number is fine if it is long enough; relying on a 32-bit checksum is naive indeed.

The MD5 TCP checksum feature in Linux kernels might be useful, but as it is not offloaded to the networking hardware, it's too slow for >100Mbit Ethernet. Employing a faster checksum function on the application layer sounds like a more practical idea.

What about DRBD?

Felix_the_Mac — Thu, 23 Aug 2007 19:38:41 +0000

I had been looking forward to DRBD being submitted and eventually included in the kernel, since I am planning to implement it at work this year.

However it looks to me (and you should take anything I say with a pinch of salt) like this proposal is going to lead to the common situation where there are 2 proposed ways of achieving some functionality.

This generally leads to a drawn out process in which each proposal is repeatedly modified and criticised until the weight of opinion causes one to be accepted or the other to give up and go home.

This may cause difficulty for the DRBD developers since they have an existing installed base of users and this may prevent them undertaking major redesigns/rewrites.

One hopes and expects that at the end of the day the kernel will end up with the best designed solution.

Lack of data integrity checks

alex — Thu, 23 Aug 2007 19:22:45 +0000

There was a very interesting talk given by a friend of mine from Google about the sort of failures they experience. One example was a data corruption event that wasn't caught by either the TCP checksums and the filesystems own internal checksums.

You don't protect your data with just one number....

http://www.ukuug.org/events/spring2007/programme/ThatCoul...

GlusterFS: Distributed storage at the filesystem level

exco — Thu, 23 Aug 2007 18:36:25 +0000

If you are interested in distributed storage:
GlusterFS implements many interesting concepts
and keep the whole system simple.

http://www.gluster.org/docs/index.php/GlusterFS_User_Guide

All the work is done at filesystem level, not at the block device level.

Lack of data integrity checks

brouhaha — Thu, 23 Aug 2007 17:51:39 +0000

There is no data integrity checking built into the DST networking layer; it relies on the networking code to handle that aspect of things.

He's living in a fool's paradise if he thinks that TCP or UDP checksums or the link level FCS (e.g, Ethernet CRC) are going to be sufficient to guarantee data integrity. I've seen far too many times where NFS caused data corruption due to the lack of end-to-end checks.

He should define some end-to-end checking, and allow it to be disabled by people that insist on living dangerously.

The checksum/CRC/whatever should be computed over the payload data AND the block identification (device ID, block number), so as to guarantee both that the data has not been corrupted in transit, and that it really is the requested block rather than some other block.

What about DRBD?

osma — Thu, 23 Aug 2007 12:17:42 +0000

How does this relate to DRBD?
http://www.drbd.org

Distributed storage

nix — Thu, 23 Aug 2007 12:09:15 +0000

GFS runs *atop* some distributed block device implementation (once iSCSI only, but now NBD+dm as well), and provides shared locking and so on so that lots of systems can access one filesystem at once. It could just as well run atop this (well, patches would probably be required since currently clustered LVM is utterly dependent upon dm, but I don't see why you couldn't run dm atop this as well.)

Distributed storage

rwmj — Thu, 23 Aug 2007 08:36:48 +0000

How is this different from things like the Sistina / Red Hat GFS?

Rich.