LWN.net Logo

Linux needs better network file systems (NewsForge)

This NewsForge article covers a range of choices available for Linux network file systems. "Our current model of the network file system is defined by the paradigm of the enterprise workstation. In this model, a large enterprise has a number of knowledge workers based at a single campus, all using individual work stations that are tied together on a single local area network (LAN)."
(Log in to post comments)

Bad article

Posted Dec 3, 2004 22:08 UTC (Fri) by jwb (guest, #15467) [Link]

The article mentions two dead network file systems and then closes the Lustre, but doesn't mention that Lustre doesn't solve any of the other filesystems' problems. Lustre's authentication is just like NFS: as long as a machine can connect, its users can do whatever they want. Also Lustre doesn't work over WANs or have the kind of recovery capability Coda and Intermezzo tried to implement.

In other words, the article tells us nothing new. It would have been more interesting to cover something like SFS, which uses SSH as the transport (solving the authentication problem) and localhost loopback NFS mounts for the actual filesystem implementation.

No mention of Novell

Posted Dec 3, 2004 22:16 UTC (Fri) by sphealey (guest, #1028) [Link]

Also no mention of Novell, who might have done just a _little_ bit of work in this area.

sPh

In defense of Lustre

Posted Dec 3, 2004 22:51 UTC (Fri) by emkey (guest, #144) [Link]

I won't claim to be an expert but I know for a fact that lustre does in fact have some amount of recovery/redundancy on the OST side, or so I've been told.

As for working over a WAN, I would think that latency is the enemy here. Do any other high performance parallel file systems work well over WANS?

As for the rest, it hasn't been around that long. Give it time. You should of course keep security in mind from the begining, but it adds a substantial amount of complexity to try to implament it up front. And anyone who thinks writing something like Lustre is easy to begin with even with minimal security doesn't know what they are talking about.

In defense of Lustre

Posted Dec 3, 2004 23:05 UTC (Fri) by jwb (guest, #15467) [Link]

What I mean is, Lustre doesn't attempt to operate disconnected, like Coda or Intermezzo do. If a Lustre system loses communications with a storage object, the entire Lustre filesystem will simply stop working. Which is fine, because Lustre isn't meant to be a handy feature for mobile laptop users, it's meant to spew extreme amounts of data into scientific clusters. As for working over a WAN, yes it can be slow, but in this case Lustre simply doesn't work. I don't know why, perhaps it operates directly over Ethernet interfaces, rather than over the IP stack.

In defense of Lustre

Posted Dec 3, 2004 23:27 UTC (Fri) by emkey (guest, #144) [Link]

That isn't what Lustre is intended to do. It is intended to provide petabyte plus file systems that can be read and written to at tens of gigabytes per second. (And is well on its way to getting there)

You might as well criticize a Semi for making a lousy golf cart...

Lustre is specifically aimed at the HPC space. Its total overkill anyplace else.

In defense of Lustre

Posted Dec 3, 2004 23:29 UTC (Fri) by emkey (guest, #144) [Link]

I've fairly sure it does use IP BTW.

I couldn't tell you why it doesn't work well over a WAN, other then to speculate earlier that latency would likely be a major problem given the complexity of the task they are trying to accomplish.

Bad article

Posted Dec 4, 2004 0:26 UTC (Sat) by mdekkers (guest, #85) [Link]

Also, glosses over AFS, while OpenAFS does all the author is asking for, and more. what a waste of time.....

Bad article

Posted Dec 4, 2004 3:15 UTC (Sat) by Zarathustra (guest, #26443) [Link]

How often what someone asks for is the last thing he needs...

OpenAFS

Posted Dec 7, 2004 6:31 UTC (Tue) by eru (subscriber, #2753) [Link]

I, too, was puzzled by why OpenAFS is not discussed (not in the article and not even here in LWN comments). I would really like to hear about its advantages/disadvantages compared to the others mentioned. Some people apparently like it on Linux: the Scientific Linux distribution (http://www.scientificlinux.org/) , which mostly is one of the RHEL3 "clones", supports it in the base system.

Bad article

Posted Dec 5, 2004 2:51 UTC (Sun) by clint (subscriber, #7076) [Link]

SFS does not use SSH for transport; this is a good thing.

Bad article

Posted Dec 6, 2004 5:30 UTC (Mon) by stevef (subscriber, #7712) [Link]

> The article mentions two dead network file systems
Not sure how dead we can consider SMB/CIFS, it still is more functionally rich than any alternative, although that complexity also makes it hard to implement. Recently between jra and I we just added remote "POSIX ACL" support to Samba and the Linux cifs client respectively. Direct support was just added to CIFS, and I expect that by 2.6.11 the Linux cifs client's performance to Samba will be competitive with NFS, and better in a few areas. We should be able to get better performance to Windows than Windows's own clients (that will be fun but not trivial - Microsoft is improving performance more quickly than many realize). There are many other functional areas that will be fairly easy to add in as these Linux cifs client performance fixes get in.

And NFSv4 is certainly not dead, if anything it has a vibrant community working on - almost as large as the Samba team - although NFSv4 lacks a few AFS/DFS features, the longer term (mid next year) story for NFSv4 looks pretty good, and in some interesting ways it reminds me more of CIFS than it does of NFSv3 (with NFSv4's improved security model, more stateful approach to opens, and its new ACL model, and named attributes - lots of similarities).

In some sense the most important issue is what is standardized, since the success of distributed filesystems depends on other applications, management libraries, smart routers and not just the filesystem itself. NFSv4 is reasonably well documented and the official IETF network filesystem standard now for more than a year, SMB/CIFS, although extensively documented has not been an official standard since 1992 (When an earlier version was the X/Open standard for PC "interworking") and everything else is even farther from that.

It would help Lustre and/or other cluster filesystems to work with SNIA or IETF to get official standards documents worked through the process.

This is also something I and jra etc. need to work through for the POSIX extensions for CIFS.

Linux needs better network file systems (NewsForge)

Posted Dec 3, 2004 23:38 UTC (Fri) by josh_stern (guest, #4868) [Link]

The type of next generation file system they are talking about should
let one ask for a file (contents) by its pseudo-unique hash (e.g. SHA1)
rather its local directory location.


Linux needs better network file systems (NewsForge)

Posted Dec 3, 2004 23:53 UTC (Fri) by uriel (guest, #20754) [Link]

You mean like venti?

http://plan9.bell-labs.com/sys/doc/venti/venti.html

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 5:21 UTC (Sat) by josh_stern (guest, #4868) [Link]

Yes that is close to what I mean. But pace what they say, a file system
that could only be read only would be reserved for special uses in most
organizations. I think that rather than just stipulate that every file is
write once, the system should support particular types of
pointers/references/links to files, where a link can explicitly be made
either de re (refering to the current file content) or de dicto (refering
to whatever is at a given symbolic location (analogous to a web page that
is regularly updated). The activity of making a de re link would create a
version of the file that would be permanent and read only so long as the
linking system object itself persisted, while making a de dicto link would
identify a particular file as part of a revision series so that one can
ask for the current version without knowing the unique id for that not yet
created version at the time of linking.


Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 0:03 UTC (Sat) by piman (subscriber, #8957) [Link]

Why? I mean, there's a decent case for storing files like that internally. But why should it expose that implementation to anyone?

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 0:16 UTC (Sat) by josh_stern (guest, #4868) [Link]

The linked article is talking about improving file system design for user contexts where people
are moving about from machine to machine and possibly domain to domain. Also they are
using devices that are not suitable as file servers because the devices are not always
connected and may have low storage capabilities. Also, large organizations benefit from
not needlessly replicating identical content (think of a paradigm in which the email
attachment is just a link). So it makes sense to have some way of naming and referencing
content that is totally independent from storage location.




Linux needs better network file systems (NewsForge)

Posted May 3, 2007 2:46 UTC (Thu) by whardier (guest, #45042) [Link]

I'm working on this at the moment. Please take a look at "crush" on sf.net, also #crush on freenode.net

Shane - hardwire

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 0:15 UTC (Sat) by braam (guest, #14522) [Link]

Lustre in fact has a rather long roadmap that may cure some of the shortcomings mentioned here.

Security

We have a kerberized version with POSIX ACL's that we are testing now. This is capable of crossing remote realms by handling uid/gid translations. Moreover, this will be able to encrypt/decrypt file contents (and later also directory entries), on the clients, with a group key sharing system. Audit logs will also be available. This security was in fact designed before any code was written and has been painless to implement.

Disconnected Operation & Proxies

There is also development for Lustre to handle disconnected operation - in more generality in fact than with intermezzo and coda. Lustre will provide "remote" proxy services which when run on a client system lead to a persistent cache. The lock granularity between the proxy and the master is more coarse than between client and the servers the client uses, without loss of semantics. This cache is designed to handle disconnected operation and reintegrations after reconnection well, part of it works as a prototype.

We will also implement a metadata writeback cache that is fully in memory (similar to tmpfs) with an replication log to support reintegration.

Redundancy

As to redundancy, there are fully operational failover OSS servers (they serve file data) and the MDS is failover as well (with fully transparent completion of system calls after failover to the backup server). Drivers which provide RAID1 redundancy among servers (at the file data object level) are in prototype. We are planning to offer other redundancy schemes to allow the use of commodity disks with availability similar to what you get with failover shared storage, which currently is typically implemented with FC SAN's behind the OSS & MDS servers.

Networking

Lustre runs over TCP, Quadrics and over 3 different version of Infiniband drivers today (and over some other networks like Myrinet), using DMA when supported by the network. It has given throughput of 1.5GB/sec over 2x 10GIGE WAN links, but I expect some tuning of the network stack for WAN use will be desirable.

Snapshots & versioning

We have betas of snapshot support and will also introduce a variety of versioning support for Lustre. The snapshot support is COW with rollback and provides extremely efficient incremental backup support.

Management

Dynamic addition of storage servers is not far out. Hot migration of data is next and we are working on a suite of gui tools for management.

The core of the HP SFS product is Cluster File Systems Lustre code and Cray will be offering several lines of systems with Lustre.

We are really primarily focussed on increasing Lustre's reliability and ease of use to enhance customer satisfaction, so these features will be coming out somewhat gradually. I'm sorry we've had so little time to keep people informed about where we are heading.

Peter Braam

Clash of Titans?

Posted Dec 4, 2004 2:18 UTC (Sat) by ccyoung (guest, #16340) [Link]

This sounds, quite frankly, sensational. And, for better or worse, I am equally or moreso blown away by Reiser's work.

I hope somewhere in the clouds you guys start talking to each other someday before we have a clash of the Titans two years from now.

Clash of Titans?

Posted Dec 4, 2004 3:11 UTC (Sat) by Zarathustra (guest, #26443) [Link]

People are so easily impressed by fluff and hype...

Simplicity, clarity and generality? nah, why do it simple when you can do it complicated?

Why fly with wings when you can instead spend your time building yet another baroque square wheel with glass tyres?

Clash of Titans?

Posted Dec 4, 2004 6:41 UTC (Sat) by emkey (guest, #144) [Link]

I suspect this is because some problems don't lend themselves to simple solutions.

Clash of Titans?

Posted Dec 4, 2004 14:30 UTC (Sat) by Zarathustra (guest, #26443) [Link]

Some people refuses to see and accept the simple solutions... complexity is always more fascinating.

Clash of Titans?

Posted Dec 4, 2004 15:33 UTC (Sat) by emkey (guest, #144) [Link]

I don't totally disagree with your point. I've often been annoyed at people who implament complicated and obfuscated solutions because they think it makes them look smart. In this case though I think you're wrong. Designing a filesystem that is scalable to the levels the Lustre is capable of is not a trivial task.

I'm willing to be proven wrong on that, but you're either going to have to write a lot of code or point me in the direction of someone who has.

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 19:11 UTC (Sat) by deatrich (subscriber, #25) [Link]

Peter, do you have any comment on whether the publically-available GPL version of Lustre will ever advance beyond 1.0.x? I've been looking into cluster FS options, and I must admit that finding the bug-fixed version (1.2.x) and the beta version (1.3.x) hidden behind a wall gives me pause. Even though Lustre is often spoken of as 'GPLed' I am beginning to think it means that the GPL version will only ever be the initial implementation..

Linux needs better network file systems (NewsForge)

Posted Dec 6, 2004 7:24 UTC (Mon) by pschwan (guest, #7699) [Link]

Good question. We have spoken many times of our commitment to release a given version of Lustre to the general public, under the GPL, within 1 year after the private release to our customers and partners.

The next such public release, from the 1.2.x series, will be made some time in February. It will likely be a quite modern version -- perhaps 1.2.4 -- which was released less than 5 months ago.

People might also find the recently-updated Lustre FAQ useful: http://www.clusterfs.com/faq.html

-Phil Schwan

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 20:58 UTC (Sat) by nix (subscriber, #2304) [Link]

If we ever needed an argument against restricting guest contributions, this post is such an argument. :)

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 22:38 UTC (Sat) by nicoruiz (subscriber, #25546) [Link]

> If we ever needed an argument against restricting guest contributions,
> this post is such an argument. :)

Quite the opposite. Did you notice that Peter Braam's comment was also made as a guest? That is precisely the kind of comment I would hate to lose.

Linux needs better network file systems (NewsForge)

Posted Dec 5, 2004 3:02 UTC (Sun) by Los__D (guest, #15263) [Link]

Which is exactly what he said... :)

Linux needs better network file systems (NewsForge)

Posted Dec 6, 2004 2:31 UTC (Mon) by linuxbox (subscriber, #6928) [Link]

Sounds like sorting the marketing hype from the actual product is a major time investment with Lustre.

Plus, as another post mentions, this is a (very costly) commercial product, not a solution to the problems of free software users. Possibly that would change if someone like Redhat ransoms it, but somehow I'm doubtful.

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 4:28 UTC (Sat) by Zarathustra (guest, #26443) [Link]

A better distributed file system is coming to Linux:

v9fs: http://v9fs.sourceforge.net/

With the elegant and simple design that one expects from the same people that created Unix and C.

Plan 9 switched paradigms long time ago, maybe there is still a chance for Linux to catch up.

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 6:22 UTC (Sat) by ncm (subscriber, #165) [Link]

Hey, this is exciting too! Thanks for the pointer.

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 6:48 UTC (Sat) by emkey (guest, #144) [Link]

Sounds interesting. Based on a few minutes reading it seems very scaleable from the perspective of adding additional space. However I don't think it would work that well if say thousands of people decided they wanted the same file, or several files from the same server at the same time.

Linux needs better network file systems (NewsForge)

Posted Dec 5, 2004 21:15 UTC (Sun) by sbergman27 (guest, #10767) [Link]

While I'm all for elegance and simplicity, I'm pragmatic enough to give big points for the virtue of "current availability", and hence I must ask the question: Why should v9fs not be considered another in a long line of very promising half-finished network filesystems for Linux?

What's different about it?

Will I really, and practically, be able to use this for Linux to Linux file sharing? When? Linux to *BSD? (Your link suggests "yes".) When? Solaris? IRIX? HP/UX? MacOS X? Windows? (I'll take interoperability with all my mission critical Plan9 servers as a given. ;-)

For the NFS family of filesystems, the answers to all these questions are "yes" and "now".

Linux needs better network file systems (NewsForge)

Posted Dec 6, 2004 3:23 UTC (Mon) by Zarathustra (guest, #26443) [Link]

Yes, and now: http://www.vitanuova.com/inferno/net_download4T.html

NFSv4

Posted Dec 4, 2004 9:32 UTC (Sat) by geertj (subscriber, #4116) [Link]

The article left the most probable (IMHO) de-facto standard in networked file systems for quite some time to come: NFSv4.

Despite it's late arrival, it is really getting there now. Redhat supports it with it's Fedora Core distribution and its Enterprise Linux 4 betas. Big storage vendors like Network Appliance support it as well. A fully commercially supported NFSv4 client and server will probably happen within 6 months or so.

The only thing that NFSv4 does not bring you is disconnected operation. For the rest it is very complete. It offers security, WAN optimization, client side caching, ACLs and extended attributes.

NFSv4

Posted Dec 4, 2004 14:45 UTC (Sat) by Zarathustra (guest, #26443) [Link]

I was looking into NFSv4 the other day.

Still full of hacks and contortions to allow it to run over UDP... some people never learn!

And still built on top of that RPC monster... but we hardcode the port numbers so we can get thru firewalls!

And hey, why use standard transport encryption like TLS when we can come up with our own pseudo-"standard" hack?
(It's amazing the amount of "standards" that NFSv4 uses, that aren't used by anyone else, and probably never will)

ACLs based in the WinNT model? wonderful.

And hey, NFSv3 wasn't complex enough, so we will make NFSv4 four times as complicated, shame on us if we can't outdo ourselves. Let's hope they did it right this time and all NFS versions die the death that they have always deserved. So far it looks good, over six years in development and still nowhere even near being as useable as NFSv3(which isn't much).

NFSv4

Posted Dec 4, 2004 19:44 UTC (Sat) by sbergman27 (guest, #10767) [Link]

1. UDP hacks: I thought that the standard mandated a protocol which handled congestion control? (i.e. tcp)

2. RPC monster: Is RPC really that bad? Having a well known port is certainly a step in the right direction.

3. Pseudo-Standards: Like it or not, if NFSv4 uses these protocols, they will become standard.

4. ACLs: I forgot. We're Unix/Linux. We don't have to worry about being compatible with anyone else. They'll make darn sure they are compatible with us. Right? Some of us have to back up Windows boxes, however distasteful we might find that to be.

5. Added complexity: Hmmm... haaah... errrr... well... It's not that... err... uhhh... bad, uhmmm... is it? ;-)

NFSv4

Posted Dec 4, 2004 20:49 UTC (Sat) by Zarathustra (guest, #26443) [Link]

1. UDP hacks: oh, mandatory congestion control? wonderful, that leaves out quite a few perfectly good transports; anyway, I don't know in practice; but in the various docs I have read the explicitly state that the justification for various hideousness in the protocol are because they need to deal with UDP's lack of warranties WRT message order.

2. RPC monster: Is RPC really that bad? No, it's worse. But hey, maybe it's not that bad compared with some of the monsters that _use_ it; like NIS or NFS...

3. Pseudo-Standards: uh? so now tell me how many people uses RPC just because NFSv3 uses it? thank god no one(outside Sun) is stupid enough to use that crap!

4. ACLs: I forgot. We're Unix/Linux. We don't do things right, we just copy whatever crap someone else has come up with before.

5. Added complexity:
Controlling complexity is the essence of computer programming. -- Brian Kernigan
The computing scientist's main challenge is not to get confused by the complexities of his own making. -- E. W. Dijkstra

NFSv4

Posted Dec 5, 2004 17:40 UTC (Sun) by geertj (subscriber, #4116) [Link]

> Still full of hacks and contortions to allow it to run over UDP... some people never learn!

The only reference the RFC makes to UDP is in combination with the r_addr / r_netid structures. Do you have something specific in mind?

> And still built on top of that RPC monster... but we hardcode the port numbers so we can get thru firewalls!

SUN (aka ONC) RPC is a minimal implementation of a remote procedure call framework. In fact, it's so minimal that I think if you would leave out one single field its request/response structures, the whole thing doesn't work anymore. The claim that it's a "monster" is surely not justified by its complexity. Do you have something else in mind that would classify SUN RPC as a "monster"?

> And hey, why use standard transport encryption like TLS when we can come up with our own pseudo-"standard" hack? (It's amazing the amount of "standards" that NFSv4 uses, that aren't used by anyone else, and probably never will)

TLS uses X509 certificates for authentication. Because there are not suitable for all circumstances, authentication at the application level would have been required if NFS had used TLS.

ONC RPC provides a pluggable slot for so-called "authentication flavours", that can perform integrity and pricacy protection as well. By default, both certificate based authentication (using SPKM3/LIPKEY) and symmetric key based authentication (using GSSAPI/Kerberos) are available in NFSv4.

> ACLs based in the WinNT model? wonderful.

This is quite wonderful indeed. The semantics of Windows NT ACLs are richer than the (abandoned draft standard) POSIX ACLs. This means that any POSIX ACL can be expressed by a WinNT ACL, but not the other way around.

There is a draft RFC (draft-ietf-nfsv4-acl-mapping-02.txt) specifying how to map POSIX ACLs to NFSv4 ACLs. NFSv4 simply implemented the most generic ACL semantics that are currently in use.

> And hey, NFSv3 wasn't complex enough, so we will make NFSv4 four times as complicated, shame on us if we can't outdo ourselves.

NFSv3 was a very simple protocol. That's why clients were more compliacted to write, if they wanted to offer the traditional Unix file system semantics. NFSv4 is a bit more complex indeed, but it also offers much broader functionality.

Linux needs better network file systems (NewsForge)

Posted Dec 4, 2004 21:26 UTC (Sat) by bluefoxicy (guest, #25366) [Link]

I was working on a replacement for NFS about a year ago (abandoned it).

http://foxfs.sourceforge.net/anlfs/

It's immature and fugly; the caching of security extensions has to be redone (I can do that easily-- just make the server able to push "security changed" messages down to the host, and let the host just cache if access was denied. Granted access must be repetedly rechecked to avoid races!)

I can start over again and kick out another one if it's really needed.

Linux needs better network file systems (NewsForge)

Posted Dec 6, 2004 14:59 UTC (Mon) by Seegras (subscriber, #20463) [Link]

I am actually looking for an encrypted network-filesystem. Content encrypted on disk and in transfer; you can only get something useful out of it if the client(-computer) has the correct key.

Linux needs better network file systems (NewsForge)

Posted Dec 6, 2004 18:38 UTC (Mon) by josh_stern (guest, #4868) [Link]

You might consider just passing regular files to the server
that are actually encrypted virtual file systems:

http://www.tldp.org/HOWTO/Loopback-Encrypted-Filesystem-H...


Linux needs better network file systems (NewsForge)

Posted Dec 9, 2004 13:00 UTC (Thu) by nix (subscriber, #2304) [Link]

Alternatively, you could use encrypted loopback mounts on the server, and serve everything over SFS.

Rubbish

Posted Dec 7, 2004 9:59 UTC (Tue) by janpla (guest, #11093) [Link]

Linux doesn't need a better network file system. Some users may wish to have one, but Linux doesn't need it. The thing here is that file systems are just a certain sort of application, really. And I feel when people being to talk about 'Linux needs ...', they assume that we would all want to, or perhaps more likely that everyone should be forced to use their preferred style of whatever.

I personally have no need for more complicated security involving access control lists and what have you.

Rubbish

Posted Dec 7, 2004 14:27 UTC (Tue) by josh_stern (guest, #4868) [Link]

"And I feel when people being to talk about 'Linux needs ...', they assume
that we would all want to, or perhaps more likely that everyone should be
forced to use their preferred style of whatever."

IMO, that is not the correct interpretation of what that phrase means.
It's more like, "Linux doesn't have X currently in usable form and we
think there is a demand for it". Some people might also mean "Linux
doesn't have X and more people would use Linux if it did". Linux is so
configurable that it is almost never the case of a new option forcing a
change in existing usage patterns, even for new kernels (which of course
nobody is forcing anyone to upgrade to).


Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds