Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 23, 2013
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
Anyone who knows how POHMELFS compares to Ceph ?
POHMELFS versus Ceph
Posted Feb 10, 2012 4:58 UTC (Fri) by cmccabe (guest, #60281)
The object storage layer of Ceph seems vaguely similar to Evgeniy's "elliptics network." However, there are some very important differences.
In Ceph, there is only ever one object storage daemon that has the authority to write to an object. In contrast, the elliptics network is based on the Chord paper from MIT. So potentially there could be multiple writers to a single distributed hash table (DHT) object at once. In his 2010 paper , Evigeny describes the DHT model as "write-always-succeed and eventual consistency."
One of the biggest questions about any distributed system is how it handles "split-brain syndrome." In other words, what happens when we cut the network into two halves which cannot talk to one another? In Ceph, only one of those halves would be able to continue functioning. This is accomplished by the monitors, who use Paxos  to decide on changes in cluster topology. In contrast, in the elliptics network, it looks like both halves would continue functioning. Then later, if they were reunified, we would make some attempt to "merge the histories."
Merging histories sounds good in theory, but in practice it's kind of a quagmire. What happened if one side of the brain decided to delete all files in /foo and remove the directory, and another side added files to this directory? Who should "win"? When the parallel universes implode into one, there are potentially going to be some unhappy users. To be fair, some users seem willing to accept this.
Another issue is caching. Ceph has true POSIX read-after-write semantics. If you're on one computer and you write a byte to a file at offset 0, and then immediately afterwards someone on another computer reads a byte from offset 0, he'll see exactly what you wrote. In CAP terms , Ceph is a highly consistent system. In contrast, in his commit message, Evigeny says that POHMELFS will "only sync (or close with sync_on_close mount option) or writeback will flush data to remote nodes."
That actually seems to run counter to what I would call "strict" POSIX semantics. However, I've never seen a formal definition of POSIX filesystem semantics and my usage is kind of informal. If anyone has a document which clarifies it succinctly, I'd love to see it.
Full disclosure: I worked on the Ceph project for a while.
P.S. Evigeny, if anything I've said here is wrong, please let me know. Elliptics and POHMELFS seem like an interesting projects and I'm always curious to see what you'll come up with in the future.
P.P.S. Evigeny, if you're reading this, do you have any ideas about avoiding replication storms?
Posted Feb 16, 2012 17:13 UTC (Thu) by raalkml (guest, #72852)
Er, yes. It is "Evgeniy" :)
Posted Apr 6, 2012 5:46 UTC (Fri) by bradfitz (subscriber, #4378)
Posted Apr 6, 2012 7:34 UTC (Fri) by Cyberax (✭ supporter ✭, #52523)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds