NFS: the early years

Posted Jun 21, 2022 23:44 UTC (Tue) by jwarnica (subscriber, #27492)
Parent article: NFS: the early years

NFS is proof-of-failure #1 for the basic theory of RPC, that is, that without thinking about it you can put a C level (that is, asm level) subroutine call across the network without any further thought.

This put "the network is the computer" behind schedule by a decade, and triggered people to fix the idea by double downing on the idea of ignoring the network in the form of CORBA, which contributed another decade of schedule slip.

Except as examples of what not to do, good riddance.

I'm sorry for the sysadmins who had to deal with this. I'm even more sorry for those who let it escape the Valley.

Remote filesystems: a lost cause?

Posted Jul 9, 2022 21:56 UTC (Sat) by marcH (subscriber, #57642) [Link] (6 responses)

Indeed the https://en.wikipedia.org/wiki/Fallacies_of_distributed_co... come to mind.

Does _any_ network filesystem "suck less"? The entire concept seems like a lost cause. Sure, a samba share is marginally more convenient than rsync or wget stuff.tgz for some occasional recursive _download_ but that's still just glorified FTP. Is there any actual and successful use case where multiple clients are actively working on the same, shared tree at the same time?

Just for fun try to time "git status" over a network file system and compare with a local file system. The thing that relies a lot on "state" is caching and caching is what has made computers faster and faster. Is high-performance caching compatible with sharing access over the network? It does not look like it.

I'm more and more convinced that the future belongs to application-specific protocols. The poster child is of course git which is capable of synchronizing Gigabytes while transferring kilobytes. Of course there is a price to pay: the synchronization requires explicit user action, remote resources do not magically appear like their local. But that's just acknowledging Network Fallacies and reality.

PS: NUMA and CXL seem similarly... "ambitious"

Remote filesystems: a lost cause?

Posted Jul 9, 2022 22:57 UTC (Sat) by atnot (guest, #124910) [Link]

> Does _any_ network filesystem "suck less"? The entire concept seems like a lost cause.

I think it gets worse, because all filesystems are secretly network filesystems. It doesn't really matter that much whether the two computers speak to each other over ethernet, SCSI or PCIe, you just notice it less with lower latency. Basing persistent state storage on what amounts to a shared, globally read-writable single address space without any transaction or really, any well defined concurrency semantics is, I think, a fundamental dead end. See also the symlink discussion. So much effort is thrown into pretending files work like they did on a pdp11 at a very deep level, and I think it's really something that needs to be moved beyond. Git is actually a pretty good example I never thought of there, in the way it sort of emulates a traditional filesystem structure on top of a content-addressed key-value blob store.

Remote filesystems: a lost cause?

Posted Jul 10, 2022 6:04 UTC (Sun) by donald.buczek (subscriber, #112892) [Link] (4 responses)

NFS doesn't suck.

We have ~400 Linux systems ranging from workstations to really big compute servers using a global namespace supported by autofs and nfs for /home and all places where our scientific data goes (/project). We even have software packages installed and used over nfs ( /pkg ) because, to make our data processing reproducible, we keep every version of every library or framework and that would be too much to keep on every system. Only base Linux system, copies of the current/default version of some highly used packages (e.g. python) and some scratch space are on local disks.

We have ~500 active users and they usually don't complain about perfomance or responsiveness of their desktop.

We do this for decades and it used to be a pain, but with the progress of NFS and the steps we've taken here (e.g. replace NIS), everything runs really smoothly these days! If it doesn't, some user acted against the rules (e.g. hammering a fileserver from many distributed cluster jobs or trying to up- or download a few terrabytes without a speed limit) and we have a talk.

Oh, and the same namespace ( /home, /project ) is accessed over CIFS by our Windows and macOS workstation. Plus the files are accessed locally on the fileservers, too. For example, we typically would put daemons and cronjobs, which don't need much ram or cpu, on the fileserver where their projects are local, to reduce network dependency. Also users are allowed to log in to the HPC node where their job is executing to monitor or debug its behavior.

And (related to other discussions) we couldn't live without symlinks. And all these systems are multi-user, security border between users is basic requirement.

NFS, symlinks and multiuser are not dead, "Nowadays things are done like this and that" might be true in a statistical sense but should not be generalized.

Remote filesystems: a lost cause?

Posted Jul 10, 2022 7:40 UTC (Sun) by marcH (subscriber, #57642) [Link]

Thanks! So mostly reads, no read/write concurrency and rarely in the critical path? Then sure, as long as you have the tools to monitor and catch "deviant" users then why not. Not exactly plug and play / "consumer"-friendly though and quite far from the illusion of a local disk.

Remote filesystems: a lost cause?

Posted Jul 18, 2022 16:08 UTC (Mon) by ajmacleod (guest, #1729) [Link] (2 responses)

Out of interest, what did you replace NIS with? I likewise have found NFS indispensable but NIS has hung around a lot longer than it should have in some use cases because for them it just works and is very simple (yes, too simple.)

Remote filesystems: a lost cause?

Posted Jul 18, 2022 20:15 UTC (Mon) by donald.buczek (subscriber, #112892) [Link] (1 responses)

We use self-developed tools to broadcast /etc files like passwd, group, exports or autofs-maps to all systems [1]. For shadow, we replaced nis with a self-develooped nss service, which queries a central server via tls [2].

[1]: https://github.molgen.mpg.de/mariux64/mxtools/blob/master...

[2]: https://github.molgen.mpg.de/mariux64/mxshadow

Remote filesystems: a lost cause?

Posted Jul 22, 2022 15:48 UTC (Fri) by ajmacleod (guest, #1729) [Link]

That's very interesting, thank you for the reply.