|
|
Subscribe / Log in / New account

POHMELFS returns

By Jonathan Corbet
February 8, 2012
LWN wrote briefly about the POHMELFS filesystem in early 2008; thereafter, POHMELFS has languished in the staging tree without much interest or activity. The POHMELFS developer, Evgeniy Polyakov, expressed his unhappiness with the development process and disappeared from the kernel community for some time.

Now, though, Evgeniy is back with a new POHMELFS release. He said:

It went a long way from parallel NFS design which lived in drivers/staging/pohmelfs for years effectively without usage case - that design was dead.

New pohmelfs uses elliptics network as its storage backend, which was proved as effective distributed system. Elliptics is used in production in Yandex search company for several years now and clusters range from small (like 6 nodes in 3 datacenters to host 15 billions of small files or hundred of nodes to scale to 1 Pb used for streaming).

This time around, he is asking that the filesystem be merged straight into the mainline without making a stop in the staging tree. But merging a filesystem is hard without reviews from the virtual filesystem maintainers, and no such reviews have yet been done. So Evgeniy may have to wait a little while longer yet.

Index entries for this article
KernelFilesystems/Network


to post comments

POHMELFS returns

Posted Feb 9, 2012 11:06 UTC (Thu) by nix (subscriber, #2304) [Link] (7 responses)

So, in place of the old POHMELFS, which had no use case other than 'just like NFS, only better, look, you can take an existing filesystem and distribute it across the network instantly!' (I'm not sure I can think of a more common use case), we have... this, which requires you to shift all your FS data onto new storage which cannot be accessed without pohmelfs.

Not an improvement, sorry.

I think I'll try out NFSv4 one of these days. Maybe it's got inotify support now. I'd really like something NFSish ('take an FS and make it distributed, you don't need a cluster or special fabrics or anything like that') but that is closer to POSIX and also preferably supports inotify so that modern KDE and GNOME versions have a chance of working properly if your home directory is exported over it.

POHMELFS returns

Posted Feb 9, 2012 12:09 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

That was also the flaw with the Andrew File System (AFS) I believe.

POHMELFS returns

Posted Feb 9, 2012 21:18 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Yeah. AFS made NFS look like the soul of POSIX-compliance: no cross-directory hardlinks, close() with extra magic effects (IIRC), and its own very non-Unixlike permissions system. (Ironically it would look more Unixlike today than when it was originally written, because ACLs are fairly similar to the AFS permission model.)

POHMELFS returns

Posted Feb 10, 2012 4:14 UTC (Fri) by cmccabe (guest, #60281) [Link]

If POSIX compliance was the path to victory, we'd all be using AT&T's RFS now. (I tried to find a link, but apparently RFS doesn't even have a wikipedia entry... sigh.)

Personally, I think AFS was a great system that should have been more widely adopted. It didn't get open sourced until much later, though, and the usual VHS vs. Betamax thing happened.

POHMELFS returns

Posted Feb 9, 2012 13:24 UTC (Thu) by jackb (guest, #41909) [Link]

Some people have great experiences with NFSv4. My experience has been that it's easy to encounter obscure bugs.

About a month ago I went through a period in which running emerge on any client on a network in which /home and /usr/portage is hosted on an NFS server would randomly trigger lockd errors on all the other clients, requiring a hard reboot to resolve.

Then after a few weeks the problem went away. I'm not sure which update (kernel, nfs-utils, portage, or some other dependency) resolved the problem. I didn't change any configuration during this time. That basically describes my experience with NFS - it's good when it works but it's also prone to mysterious and inexplicable problems from time to time.

POHMELFS returns

Posted Mar 3, 2012 10:01 UTC (Sat) by TRS-80 (guest, #1804) [Link] (2 responses)

Have you looked at GlusterFS? It stores files on a regular filesystem, and then makes them network accessible, optionally clustering, striping and mirroring them.

POHMELFS returns

Posted Mar 5, 2012 23:46 UTC (Mon) by nix (subscriber, #2304) [Link] (1 responses)

I haven't looked at it, though I've heard of it. I'll give it a look: at first sight it looks really rather nice.

POHMELFS returns

Posted Apr 13, 2012 19:15 UTC (Fri) by nix (subscriber, #2304) [Link]

No good. glusterfs cannot support inotify, because it is based on FUSE, and FUSE doesn't support inotify. Currently, it seems, only in-kernel distributed filesystems can support inotify -- and, as far as I can see, none of them do.

POHMELFS versus Ceph

Posted Feb 9, 2012 12:02 UTC (Thu) by abacus (guest, #49001) [Link] (4 responses)

Anyone who knows how POHMELFS compares to Ceph ?

POHMELFS versus Ceph

Posted Feb 10, 2012 4:58 UTC (Fri) by cmccabe (guest, #60281) [Link] (3 responses)

Ceph has a three tier architecture-- monitors, metadata servers, and object storage daemons.

The object storage layer of Ceph seems vaguely similar to Evgeniy's "elliptics network." However, there are some very important differences.

In Ceph, there is only ever one object storage daemon that has the authority to write to an object. In contrast, the elliptics network is based on the Chord paper from MIT.[1] So potentially there could be multiple writers to a single distributed hash table (DHT) object at once. In his 2010 paper [2], Evigeny describes the DHT model as "write-always-succeed and eventual consistency."

One of the biggest questions about any distributed system is how it handles "split-brain syndrome." In other words, what happens when we cut the network into two halves which cannot talk to one another? In Ceph, only one of those halves would be able to continue functioning. This is accomplished by the monitors, who use Paxos [3] to decide on changes in cluster topology. In contrast, in the elliptics network, it looks like both halves would continue functioning. Then later, if they were reunified, we would make some attempt to "merge the histories."

Merging histories sounds good in theory, but in practice it's kind of a quagmire. What happened if one side of the brain decided to delete all files in /foo and remove the directory, and another side added files to this directory? Who should "win"? When the parallel universes implode into one, there are potentially going to be some unhappy users. To be fair, some users seem willing to accept this.

Another issue is caching. Ceph has true POSIX read-after-write semantics. If you're on one computer and you write a byte to a file at offset 0, and then immediately afterwards someone on another computer reads a byte from offset 0, he'll see exactly what you wrote. In CAP terms [4], Ceph is a highly consistent system. In contrast, in his commit message, Evigeny says that POHMELFS will "only sync (or close with sync_on_close mount option) or writeback will flush data to remote nodes."

That actually seems to run counter to what I would call "strict" POSIX semantics. However, I've never seen a formal definition of POSIX filesystem semantics and my usage is kind of informal. If anyone has a document which clarifies it succinctly, I'd love to see it.

Full disclosure: I worked on the Ceph project for a while.

P.S. Evigeny, if anything I've said here is wrong, please let me know. Elliptics and POHMELFS seem like an interesting projects and I'm always curious to see what you'll come up with in the future.

P.P.S. Evigeny, if you're reading this, do you have any ideas about avoiding replication storms?

[1] http://www.pdos.lcs.mit.edu/chord
[2] http://www.ioremap.net/tmp/lk2010-elliptics-text.pdf
[3] http://the-paper-trail.org/blog/?p=173
[4] http://en.wikipedia.org/wiki/CAP_theorem

POHMELFS versus Ceph

Posted Feb 16, 2012 17:13 UTC (Thu) by raalkml (guest, #72852) [Link] (2 responses)

> P.S. Evigeny, if anything I've said here is wrong

Er, yes. It is "Evgeniy" :)

POHMELFS versus Ceph

Posted Apr 6, 2012 5:46 UTC (Fri) by bradfitz (subscriber, #4378) [Link]

POHMELFS versus Ceph

Posted Apr 6, 2012 7:34 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, the newest Russian transliteration rules (I hate them) state that it should be "Evgeni" :)


Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds