Sharing buffers between devices
The foreman's job will be easier if the various devices under its control can communicate easily with each other. One useful addition in this area might be the buffer sharing patch set recently posted by Marek Szyprowski. The idea here is to make it possible for multiple kernel subsystems to share buffers under the control of user space. With this type of feature, applications could wire kernel subsystems together in problem-specific ways then get out of the way, letting the devices involved process the data as it passes through.
There are (at least) a couple of challenges which must be dealt with to make this kind of functionality safe to export to applications. One is that the application should not be able to "create" buffers at arbitrary kernel addresses. Indeed, kernel-space addresses should not be visible to user space at all, so the kernel must provide some other way for an application to refer to a specific buffer. The other is that shared buffers must not go away until all users have let go of it. A buffer may be created by a specific device driver, but it must persist, even if the device is closed, until nobody else expects it to be there.
The mechanism added in this patch set (this part in particular is credited to Tomasz Stanislawski) is relatively simple - though it will probably get more complex in the future. Kernel code wanting to make a buffer available to other parts of the kernel via user space starts by filling in one of these structures:
struct shrbuf {
void (*get)(struct shrbuf *);
void (*put)(struct shrbuf *);
unsigned long dma_addr;
unsigned long size;
};
One could immediately raise a number of complaints about this structure: the address should be a dma_addr_t, there's no reason not to put the kernel virtual address there, only physically-contiguous buffers are allowed, etc. It also seems like there could be value in the ability to annotate the state of the buffer (filled or empty, for example) and possibly signal another thread when that state changes. But it's worth remembering that this is an explicitly proof-of-concept patch posting and a lot of things will change. In particular, the eventual plan is to pass a scatterlist around instead of a single physical address.
The get() and put() functions are important: they manage reference counts to the buffer, which must continue to exist until that count goes to zero. Any subsystem depending on a buffer's continued existence should hold a reference to that buffer. The put() function should release the buffer when the last reference is dropped.
Once this structure exists, it can be passed to:
int shrbuf_export(struct shrbuf *sb);
The return value (if all goes well) will be an integer file descriptor which can be handed to user space. This file descriptor embodies a reference to the buffer, which now will not be released before the file descriptor is closed. Other than closing it, there is very little that the application can do with the descriptor other than give it to another kernel subsystem; attempts to read from or write to it will fail, for example.
If a kernel subsystem receives a file descriptor which is purported to represent a kernel buffer, it can pass that descriptor to:
struct shrbuf *shrbuf_import(int fd);
The return value will be the same shrbuf structure (or an ERR_PTR() error value for a file descriptor of the wrong type). A reference is taken on the structure before returning it, so the recipient should call put() at some future time to release it.
The patch set includes a new Video4Linux2 ioctl() command (VIDIOC_EXPBUF) enabling the exporting of buffers as file descriptors; a couple of capture drivers have been augmented to support this functionality. No examples of the other side (importing a buffer) have been posted yet.
There has been relatively little commentary on the patch set so far,
possibly because it was posted to a couple of relatively obscure mailing
lists. It has the look of functionality that could be useful beyond one or
two kernel subsystems, though. It would probably make sense for the next
iteration, which presumably will have more of the anticipated functionality
built into it, to be distributed more widely for review.
| Index entries for this article | |
|---|---|
| Kernel | Device drivers/Support APIs |
Posted Aug 18, 2011 12:42 UTC (Thu)
by justincormack (subscriber, #70439)
[Link] (3 responses)
Posted Aug 18, 2011 14:04 UTC (Thu)
by cladisch (✭ supporter ✭, #50193)
[Link] (1 responses)
Posted Aug 18, 2011 14:11 UTC (Thu)
by justincormack (subscriber, #70439)
[Link]
Posted Aug 18, 2011 19:24 UTC (Thu)
by robclark (subscriber, #74945)
[Link]
Posted Aug 18, 2011 19:58 UTC (Thu)
by dougg (guest, #1894)
[Link] (2 responses)
Posted Aug 19, 2011 11:40 UTC (Fri)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Aug 19, 2011 20:31 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
The shunting in and out of user space is easy to eliminate: mmap.
That leaves you with the copying from one device's buffer to the other. For that, Linus invented the 'splice' system call about ten years ago and, as justincormack points out in another comment on this article, actually implemented in 2005. Splice takes two file descriptors and a length as arguments and reads that many bytes from one of the devices and writes it to the other, by DMAing into, then out of, the same memory.
https://lwn.net/Articles/119682/ . I don't know what the current state of deployment is, though.
The MCA thing would presumably be the next step, where the data doesn't have to stop over in system memory.
In big systems, where the storage devices are rather separate from the CPUs, this exists in the form that you can tell a device to send some of its contents directly to another device, e.g. through a fibre channel network. I guess the same thing over a PCI-class network can't be far behind.
In fact, my guess is that the bus protocol itself allows this in PCI Express and Infiniband; I don't think the CPU/main memory is particularly special in those protocols. Does somebody know?
Posted Aug 19, 2011 16:21 UTC (Fri)
by cavok (subscriber, #33216)
[Link] (1 responses)
Posted Aug 19, 2011 21:20 UTC (Fri)
by zlynx (guest, #2285)
[Link]
This is sendmsg/recvmsg with SCM_RIGHTS, I believe.
Posted Aug 25, 2011 2:36 UTC (Thu)
by quanstro (guest, #77996)
[Link]
Posted Aug 26, 2011 14:37 UTC (Fri)
by slashdot (guest, #22014)
[Link] (1 responses)
If the issue is accessibility via restricted DMA, then add some way to mmap ZONE_DMA or similar memory.
If the issue is physical contiguity, add support to mmap contiguous memory.
If the issue is synchronous IO, make the APIs asynchronous.
Posted Sep 10, 2011 2:53 UTC (Sat)
by smowton (guest, #57076)
[Link]
I *think* splice() should be able to accomplish the same feats as this patchset providing both ends can provide an FD that adequately represents what we want to do with the buffer; e.g. in the case that we're piping network packets to a video device, the video driver needs to be able to provide an FD representing the target video buffer, pixel format, etc, so something like:
// fd is a socket
In the language of this patchset the network driver would provide an ioctl that yields a buffer FD, and the video driver would provide one that copies buffer data into specified video memory.
int bufferfd;
Basically a splice()-based approach would need more ioctls in order to establish something that looks like a "connection endpoint" that means what we want it to, but mean less FD table operations if we want to repeatedly perform a similar operation (likely?), whilst an ioctl() + fd-per-buffer approach means lots of fd table operations (efficient locks in the table?) but less syscalls if we're routing buffers in a way that would effectively compel a splice operation to create a new channel per operation.
Sharing buffers between devices
Sharing buffers between devices
Sharing buffers between devices
Sharing buffers between devices
Sharing buffers between devices
Sharing buffers between devices
Sharing buffers between devices
A more efficient dd between storage devices could be implemented if the data
did not need to be shunted in and out of the user space.
Sharing buffers between devices
May sharing buffers between devices cross also the application boundary?
In such case, what happens to the fd number?
Sharing buffers between devices
Sharing buffers between devices
for over a decade with the kernel-only versions of read
and write, bread and bwrite.
Sharing buffers between devices
Sharing buffers between devices
int fd2 = ioctl(video_control_fd, IOCTL_GET_SINK_FD, /* description of sink buffer */);
splice(fd, fd2, ...);
int fd = ioctl(socket_or_if_control_fd, IOCTL_GET_PACKET_BUFFER, &bufferfd);
ioctl(video_control_fd, IOCTL_COPY_BUFFER_TO_VMEM, fd2, /* buffer description */);
