User: Password:
|
|
Subscribe / Log in / New account

POHMELFS versus Ceph

POHMELFS versus Ceph

Posted Feb 9, 2012 12:02 UTC (Thu) by abacus (guest, #49001)
Parent article: POHMELFS returns

Anyone who knows how POHMELFS compares to Ceph ?


(Log in to post comments)

POHMELFS versus Ceph

Posted Feb 10, 2012 4:58 UTC (Fri) by cmccabe (guest, #60281) [Link]

Ceph has a three tier architecture-- monitors, metadata servers, and object storage daemons.

The object storage layer of Ceph seems vaguely similar to Evgeniy's "elliptics network." However, there are some very important differences.

In Ceph, there is only ever one object storage daemon that has the authority to write to an object. In contrast, the elliptics network is based on the Chord paper from MIT.[1] So potentially there could be multiple writers to a single distributed hash table (DHT) object at once. In his 2010 paper [2], Evigeny describes the DHT model as "write-always-succeed and eventual consistency."

One of the biggest questions about any distributed system is how it handles "split-brain syndrome." In other words, what happens when we cut the network into two halves which cannot talk to one another? In Ceph, only one of those halves would be able to continue functioning. This is accomplished by the monitors, who use Paxos [3] to decide on changes in cluster topology. In contrast, in the elliptics network, it looks like both halves would continue functioning. Then later, if they were reunified, we would make some attempt to "merge the histories."

Merging histories sounds good in theory, but in practice it's kind of a quagmire. What happened if one side of the brain decided to delete all files in /foo and remove the directory, and another side added files to this directory? Who should "win"? When the parallel universes implode into one, there are potentially going to be some unhappy users. To be fair, some users seem willing to accept this.

Another issue is caching. Ceph has true POSIX read-after-write semantics. If you're on one computer and you write a byte to a file at offset 0, and then immediately afterwards someone on another computer reads a byte from offset 0, he'll see exactly what you wrote. In CAP terms [4], Ceph is a highly consistent system. In contrast, in his commit message, Evigeny says that POHMELFS will "only sync (or close with sync_on_close mount option) or writeback will flush data to remote nodes."

That actually seems to run counter to what I would call "strict" POSIX semantics. However, I've never seen a formal definition of POSIX filesystem semantics and my usage is kind of informal. If anyone has a document which clarifies it succinctly, I'd love to see it.

Full disclosure: I worked on the Ceph project for a while.

P.S. Evigeny, if anything I've said here is wrong, please let me know. Elliptics and POHMELFS seem like an interesting projects and I'm always curious to see what you'll come up with in the future.

P.P.S. Evigeny, if you're reading this, do you have any ideas about avoiding replication storms?

[1] http://www.pdos.lcs.mit.edu/chord
[2] http://www.ioremap.net/tmp/lk2010-elliptics-text.pdf
[3] http://the-paper-trail.org/blog/?p=173
[4] http://en.wikipedia.org/wiki/CAP_theorem

POHMELFS versus Ceph

Posted Feb 16, 2012 17:13 UTC (Thu) by raalkml (guest, #72852) [Link]

> P.S. Evigeny, if anything I've said here is wrong

Er, yes. It is "Evgeniy" :)

POHMELFS versus Ceph

Posted Apr 6, 2012 5:46 UTC (Fri) by bradfitz (subscriber, #4378) [Link]

POHMELFS versus Ceph

Posted Apr 6, 2012 7:34 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, the newest Russian transliteration rules (I hate them) state that it should be "Evgeni" :)


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds