Sometime around the end of January or early February, the Internet Engineering Task Force will
give its final blessing to the latest
version of the venerable Network File System (NFS), version 4.1. While the authors of the standard have stressed that this is a minor
revision of NFS, it does have at least one seemingly radical new option,
called Parallel NFS (pNFS).
The "parallel" tag of pNFS means NFS clients can access
large pools of storage directly, rather than go through the storage
server. Unbeknown to the clients, what they store is striped across
multiple disks, so when that data is needed it can be called back in
parallel, cutting retrieval time even more. If you run a cluster
computer system, you may immediately recognize the appeal of this approach.
"We're starting the process of feeding all these patches up to the
Linux NFS maintainers," said Brent Welch, the director of
software architecture for Panasas who is also one of that storage
company's contributors of the pNFS code. He noted that the work for the
prototyping and implementing pNFS in Linux, as part of NFS, has been
going on for about two years. Ongoing work has included updating both the NFS
client and NFS server software.
The code will be proposed for the Linux kernel in two sets, according to
Welch. The first set will have the basic procedures for setting up and
tearing down pNFS sessions, using Remote Procedure Call (RPC) operations
for exchanging IDs and initiating and ending sessions. The development teams are gunning to have
this basic outline of pNFS included in the 2.6.30 version of the kernel. The second set, ready for the 2.6.31 version of the
kernel, will be a larger patch, including the I/O commands for accessing
and changing file layouts as well as reading and writing data. Given that it will take a few more months after the 2.6.31 Kernel for it to be picked up by the major distributions, pNFS probably won't start to be deployed by even the most ambitious IT shops at least until the early part of 2010.
We all know NFS. It allows client machines to mount Unix drives that
reside across the network as if they were local disks. Many Network
Attached Storage (NAS)-based storage arrays use NFS. With NAS, a lot of
hard drives all lie behind a single IP address, the drives are all
managed by the NAS box.
NAS allows organizations to pool storage, so storage administrators
can more fluidly (and hence efficiently) allocate that storage across
In a 2004 problem
statement, two of the developers responsible for getting pNFS in
motion, Panasas chief technology officer Garth Gibson and Network
Appliance (NetApp) engineer Peter Corbett, explained the limitations of this
approach, especially in high performance computing environments:
The storage I/O bandwidth requirements of clients
are rapidly outstripping the ability of network file servers to supply
them. [...] The NFSv4 protocol currently requires that all the data in a
single file system be accessible through a single exported network
endpoint, constraining access to be through a single NFS server.
In a nutshell, the potential roadblock with NAS, or any type of
NFS-based network storage, is the NAS head, or server, they explained.
If too many of your clients hit the NAS server at the same time, then the
I/O slows for everyone. You could go back to direct access, but you lose
the efficiencies of pooled storage. For cluster computer systems, in
which dozens of nodes can be working on the same data set, such
partitioned storage just isn't feasible. Nor are multiple storage
servers: An NFS-based system can not support multiple servers writing to
the same file system.
Gibson and Corbett were early champions of developing pNFS, along with
Los Alamos National Laboratory's Gary Grider. Additional work was
carried out by engineers at EMC, Panasas, NetApp and other companies.
The University of Michigan's Center for Information
Technology Integration (CITI), along with members of the IBM Almaden
Research Center are developing a
pNFS implementation for Linux, both for clients and storage
pNFS will allow clients to connect
directly to the storage devices they
need, rather than go through a storage gateway of some sort. The folks
behind pNFS like to say that their approach separates the
control traffic from the data traffic. When a client requests a particular
file or block of storage,
it sends a request to a server called the Metadata Server (MDS), which
returns a map of where all the data
resides within the storage network. The client can then access that data directly, according to permissions set by the file system. Once that
storage is altered, the client notifies the MDS of the changes, which updates the file layout.
Since pNFS allows clients to talk directly to the storage devices, as well as permitting client data to be
striped across multiple storage devices, the client can enjoy a higher I/O rate than would be had simply by going through a single NAS head—or by
communicating with a single storage server. In 2007, three developers from
the IBM Almaden Research Center, Dean Hildebrand, Marc Eshel and Roger
Haskin, demonstrated [PDF]
at the Supercomputing 2007 conference (SC07) how three clients could saturate a 10 gigabit
link by drawing data from 336 Linux-based storage devices. Such
throughput "would be hard to achieve using standard NFS in terms of
accessing a single file," Hildebrand said. "We wanted to
show that pNFS could scale to the network hardware available."
pNFS is largely made up of three sets of protocols. One protocol is for the
mapping, or layout, of resources, which resides on the client. It interprets and utilizes the data map returned from the
metadata server. The second is the transport protocol, which also
resides on the client. It coordinates data transfer between the clients
and storage devices. The transport protocol handles the actual I/O with the
storage devices. A control protocol will synchronize the metadata server
with the storage devices. This last protocol is the only one not
specified by NFS—It will be left to storage the vendors, though much of
the work that this protocol will do can be codified in NFS commands.
pNFS can work with three types of storage—file-based storage,
object-based storage and block-based storage. The NFSv4.1 protocol
itself contains the file-based storage protocol. Additional RFCs are
being developed for object
protocols. File-based storage is what most system administrators think of as storage;
it is the standard approach of nesting files within a hierarchical set of directories.
Block-based storage is used in Storage Area Networks (SANs), in which the applications access disk space directly,
by sending the Small Computer System Interface (SCSI) commands over
Fibre Channel, or, increasingly of late, TCP/IP via the Internet SCSI (iSCSI) protocol.
Object-based storage is somewhat of a newer beast, a parallel approach that involves embedding the data itself with self-describing metadata.
A word on semantics: Keep in mind that just as NFS is not a file system itself, neither is pNFS.
NFS provides the protocols to work with remote files as if they were local. Likewise, pNFS offer the
ability to work with files managed by a parallel file system as if they were on a local drive, handling
such tasks as setting permissions and ensuring data integrity. Fortunately, a number of parallel file systems have been
spawned over the past few years that should work easily with pNFS.
On the open source front, there is the the parallel Virtual File
System (pVFS). Perhaps the most widely-used
open-source parallel file system now in use is Lustre, now overseen by Sun
Microsystems. On the commercial front, Panasas' PanFS file system has
been successfully deployed in high performance computer clusters, as has IBM's General
Parallel File System (GPFS). All of these approaches use a similar idea—let the
clients talk to the storage server's devices directly, while having some
form of metadata server keep track of the storage layout. But most other
options rely on using a single vendor's gear.
"The main advantage [to using pNFS] is expected to be on the client
side," noted CITI programmer J. Bruce Fields, who does the NFS 4.1
testing on Linux servers. With most parallel file systems you have to do some
kernel reconfigurations on the clients so that they can work with the file systems. With the prototype
Linux client, you can run a standard mount command and get the files you need. "The client will automatically negotiate
pNFS and find the data servers. By the time we're done that should work
on any out-of-the-box Linux client from the distribution of your
choice," he says.
The advantage that pNFS will bring is familiarity, and that it will come
already built in as part of NFS. Since NFS is a standard component in almost
all Linux kernel builds, that will greatly reduce the amount of
work administrators need to do to set up a parallel file system for
Linux servers. Most administrators are more familiar with the
general operating procedures of NFS, much more so than dealing directly with, say, Lustre,
which requires numerous kernel patches and a different mindset when it
comes to understanding commands.
pNFS should help storage vendors as well, as they will not have to port
client software to numerous Linux distributions. Welch, for instance, noted that Panasas has to maintain code for dozens of different Linux distributions. Instead, they can
rely on NFS and focus on storage devices. Already, Panasas, NetApp, EMC,
IBM and have all promised [PDF]
support pNFS in at least some of their storage products, according to a
collective talk some of the developers gave last month at the SC08 conference. Sun Microsystems also plans to support pNFS in Solaris.
And while much of the early focus of pNFS has been for large scale
cluster operations, one day it may be feasible that even workstations
and desktops will use pNFS in some form. LANL's Gary Grider pointed out that,
"at some point, having several teraflops may even be possible in
your office, in which case you may need something more than just NFS for
data access for such a powerful personal system. pNFS may end up being
handy in this environment as well."
Indeed. Once upon a time we were limited to working on files on our own machines,
FTP'ing in anything that was located elsewhere. But NFS allowed us to mount drives across
the network with a relatively simple command. Now, pNFS may take simplify things a step further,
by allowing to us to pull in and write large files or myriad files with a speed that we can now only dream about. At least that is the promise of pNFS.
to post comments)