Circular pipes

[Posted January 11, 2005 by corbet]

One of the many changes slipped quietly into BitKeeper over the last week was this patch from Linus changing how pipes are implemented internally. For a long time, pipes have used a single page to buffer data between the reader and the writer. If a process writes more than one page, it will block until the reader has consumed enough data to allow the rest to fit within the buffer. The 2.6.11 pipe implementation will be rather different.

Pipes now use a circular buffer, as inexpertly shown in the diagram below:

The curbuf pointer (it's an integer index, actually) indicates the first buffer in the array which contains data; nrbufs tells how many buffers contain data. The page structures are allocated when needed, and do not hang around when not in use. Since both readers and writers can manipulate nrbufs, some sort of locking (the pipe semaphore, in this case) is needed to serialize access. The pipe_buffer structure includes length and offset fields, so each entry in the circular buffer can contain less than a full page of data.

Linus says that the new implementation gives a "30-90%" improvement in pipe bandwidth, with only a small cost in latency (since pages must be allocated when data passes through the pipe). The performance improvements are entirely attributable to the larger amount of buffering; readers and writers will block less often when passing data through the pipe. It is a way of speeding things up by throwing memory at the problem.

Better pipe performance was not Linus's main purpose in making this change, however; he has a longer-term plan in mind. The mechanism used to implement circular pipes will evolve into a general mechanism for passing data streams through the kernel. Quite a few changes will be required to get there, and there seems to be no hurry, but there is clearly a long-term goal in mind.

Among other things, the buffers within the circular structure will gain a reference count, allowing there to be multiple readers or writers. The idea here is to implement a sort of in-kernel tee operation which would let data streams be split without additional copying. The example given by Linus is some sort of video capture device which would feed its data into one of these buffers. A process could obtain data from the buffer and display it in an on-screen window; meanwhile, another process would be capturing the stream and writing it to a file somewhere - perhaps with little or no user-space intervention.

The circular buffers will also gain the usual structure full of method pointers which would allow specific users to change how the basic operations are performed. Once that is in place, two new system calls would be added:

splice(int infd, int outfd);: This call would use a circular buffer to transfer data from infd to outfd, possibly in a zero-copy manner.
tee(int infd, int out1, int out2);: Arranges for data from infd to go to both out1 and out2

Longtime followers of Linux kernel discussions will notice a strong similarity between all of the above and Larry McVoy's splice proposal. Linus's implementation works at a lower level, however, and avoids many of the problems he saw with Larry's approach. Those who are curious about where all this is going may want to look at this explanation from Linus, where he goes into detail and concludes:

I'm clearly enamoured with this concept. I think it's one of those few "RightThing(tm)" that doesn't come along all that often. I don't know of anybody else doing this, and I think it's both useful and clever. If you now prove me wrong, I'll hate you forever ;)

There is a remaining practical issue with the current implementation. No coalescing of data written into a circular buffer is performed. Linus did things that way because he wants to make life easy for high-bandwidth, zero-copy streams using these buffers. To that end, nothing touches a page once it has added to a buffer. The problem is that, in the worst case, a process writing a single byte at a time to a pipe can consume 16 pages of memory (with the default configuration) to hold 16 bytes worth of data. Linus initially noted that nobody doing single-byte I/O should expect good performance, and suggested that people not do that. It turns out, however, that this behavior breaks a crucial application - highly parallel kernel compiles. So coalescing of writes is likely to be added in the near future.

Index entries for this article
Kernel	Circular buffers
Kernel	Pipes
Kernel	splice()

Circular pipes

Posted Jan 14, 2005 14:48 UTC (Fri) by macc (guest, #510) [Link]

isn't this rather similar to streams in SYSV?

some years ago linus was rather specific that
there would be no streams in the kernel.

mux, filters, ...
reading SysV docs from Motorola for their embedded
boards MVME147 ( 1991..) on the internals of the
distributed VME Rack Systems was a very good read.

pipe performance

Posted Jan 14, 2005 16:45 UTC (Fri) by giraffedata (guest, #1954) [Link] (1 responses)

I suppose the 30-90% performance gain is for applications that do huge multi-page reads and writes with high latency. For the common case, for which the existing pipe implementation was designed, where there is a steady stream of about 80 byte reads and writes, it is most probably slower due to the additional overhead and the loss of locality of reference.

Also, one doesn't normally want the size of pipe to increase without bound, so I'd say a parameter to limit the size is necessary. I think another parameter should choose between this and the old method.

I don't understand why this is being called "circular." The old method is more circular. In that, the same bytes of buffer get used over and over in a simple circular fashion.

pipe performance

Posted Jan 20, 2005 5:27 UTC (Thu) by sweikart (guest, #4276) [Link]

> For the common case, for which the existing pipe implementation was designed,
> where there is a steady stream of about 80 byte reads and writes ...

Actually, this is not the common case. The common case is that the programs writing to (and reading from) the pipe use stdio; when writing to a device that's not a tty, stdio defaults to block buffering. The block size used by stdio should work well with Linus's scheme (and if it doesn't, it would probably make sense to change it).

-scott

Circular pipes

Posted Apr 12, 2005 18:58 UTC (Tue) by DanWeber1 (guest, #29231) [Link]

These work for named pipes/fifos right?

Dan