Weekly Edition Return to the Kernel page |
Circular pipes
One of the many changes slipped quietly into BitKeeper over the last week
was this patch from Linus changing how
pipes are implemented internally. For a long time, pipes have used a
single page to buffer data between the reader and the writer. If a process
writes more than one page, it will block until the reader has consumed
enough data to allow the rest to fit within the buffer. The 2.6.11 pipe
implementation will be rather different.
Pipes now use a circular buffer, as inexpertly shown in the diagram below:
The curbuf pointer (it's an integer index, actually) indicates the first buffer in the array which contains data; nrbufs tells how many buffers contain data. The page structures are allocated when needed, and do not hang around when not in use. Since both readers and writers can manipulate nrbufs, some sort of locking (the pipe semaphore, in this case) is needed to serialize access. The pipe_buffer structure includes length and offset fields, so each entry in the circular buffer can contain less than a full page of data. Linus says that the new implementation gives a "30-90%" improvement in pipe bandwidth, with only a small cost in latency (since pages must be allocated when data passes through the pipe). The performance improvements are entirely attributable to the larger amount of buffering; readers and writers will block less often when passing data through the pipe. It is a way of speeding things up by throwing memory at the problem. Better pipe performance was not Linus's main purpose in making this change, however; he has a longer-term plan in mind. The mechanism used to implement circular pipes will evolve into a general mechanism for passing data streams through the kernel. Quite a few changes will be required to get there, and there seems to be no hurry, but there is clearly a long-term goal in mind. Among other things, the buffers within the circular structure will gain a reference count, allowing there to be multiple readers or writers. The idea here is to implement a sort of in-kernel tee operation which would let data streams be split without additional copying. The example given by Linus is some sort of video capture device which would feed its data into one of these buffers. A process could obtain data from the buffer and display it in an on-screen window; meanwhile, another process would be capturing the stream and writing it to a file somewhere - perhaps with little or no user-space intervention. The circular buffers will also gain the usual structure full of method pointers which would allow specific users to change how the basic operations are performed. Once that is in place, two new system calls would be added:
Longtime followers of Linux kernel discussions will notice a strong similarity between all of the above and Larry McVoy's splice proposal. Linus's implementation works at a lower level, however, and avoids many of the problems he saw with Larry's approach. Those who are curious about where all this is going may want to look at this explanation from Linus, where he goes into detail and concludes:
I'm clearly enamoured with this concept. I think it's one of those
few "RightThing(tm)" that doesn't come along all that often. I
don't know of anybody else doing this, and I think it's both useful
and clever. If you now prove me wrong, I'll hate you forever ;)
There is a remaining practical issue with the current implementation. No coalescing of data written into a circular buffer is performed. Linus did things that way because he wants to make life easy for high-bandwidth, zero-copy streams using these buffers. To that end, nothing touches a page once it has added to a buffer. The problem is that, in the worst case, a process writing a single byte at a time to a pipe can consume 16 pages of memory (with the default configuration) to hold 16 bytes worth of data. Linus initially noted that nobody doing single-byte I/O should expect good performance, and suggested that people not do that. It turns out, however, that this behavior breaks a crucial application - highly parallel kernel compiles. So coalescing of writes is likely to be added in the near future. (Log in to post comments)
Circular pipes Posted Jan 14, 2005 14:48 UTC (Fri) by macc (subscriber, #510) [Link] isn't this rather similar to streams in SYSV?
some years ago linus was rather specific that
mux, filters, ...
pipe performance Posted Jan 14, 2005 16:45 UTC (Fri) by giraffedata (subscriber, #1954) [Link] I suppose the 30-90% performance gain is for applications that do huge multi-page reads and writes with high latency. For the common case, for which the existing pipe implementation was designed, where there is a steady stream of about 80 byte reads and writes, it is most probably slower due to the additional overhead and the loss of locality of reference.Also, one doesn't normally want the size of pipe to increase without bound, so I'd say a parameter to limit the size is necessary. I think another parameter should choose between this and the old method. I don't understand why this is being called "circular." The old method is more circular. In that, the same bytes of buffer get used over and over in a simple circular fashion.
pipe performance Posted Jan 20, 2005 5:27 UTC (Thu) by sweikart (guest, #4276) [Link] > For the common case, for which the existing pipe implementation was designed, > where there is a steady stream of about 80 byte reads and writes ...Actually, this is not the common case. The common case is that the programs writing to (and reading from) the pipe use stdio; when writing to a device that's not a tty, stdio defaults to block buffering. The block size used by stdio should work well with Linus's scheme (and if it doesn't, it would probably make sense to change it). -scott
Circular pipes Posted Apr 12, 2005 18:58 UTC (Tue) by DanWeber1 (guest, #29231) [Link] These work for named pipes/fifos right?
Dan
|
Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.