The
previous installment in
this series discussed how to transfer video frames with the
read()
and
write() system calls. Such an implementation can get the
basic job done, but it is not normally the preferred method for performing
video I/O. For the highest performance and the best information transfer,
video drivers should support the V4L2 streaming I/O API.
With the read() and write() methods, each video frame is
copied between user and kernel space as part of the I/O operation. When
streaming I/O is being used, instead, this copying does not happen;
instead, the application and the driver exchange pointers to buffers.
These buffers will be mapped into the application's address space, making
it possible to perform zero-copy frame I/O. There are two
different types of streaming I/O buffers:
- Memory-mapped buffers (type V4L2_MEMORY_MMAP) are allocated
in kernel space; the application maps them into its address space with
the mmap() system call. The buffers can be large, contiguous
DMA buffers, virtual buffers created with vmalloc(), or, if
the hardware supports it, they can be located directly in the video
device's I/O memory.
- User-space buffers (V4L2_MEMORY_USERPTR) are allocated by the
application in user space. Clearly, in this situation, no
mmap() call is required, but the driver may have to work
harder to support efficient I/O to user-space buffers.
Note that drivers are not required to support streaming I/O, and, if they
do support streaming, they do not have to handle both buffer types. A
driver which is more flexible will support more applications; in practice,
it seems that most applications are written to use memory-mapped buffers.
It is not possible to use both types of buffer simultaneously.
We will now delve into the numerous grungy details involved in supporting
streaming I/O. Any Video4Linux2 driver writer will need to understand this
API; it is worth noting, however, that there is a higher-level API which
can help in the writing of streaming drivers. That layer (called
video-buf) can make life easier when the underlying device can support
scatter/gather I/O. The video-buf API will be discussed in a future
installment.
Drivers which support streaming I/O should inform the application of that
fact by setting the V4L2_CAP_STREAMING flag in their
vidioc_querycap() method. Note that there is no way to describe
which buffer types are supported; that comes later.
The v4l2_buffer structure
When streaming I/O is active, frames are passed between the application and
the driver in the form of struct v4l2_buffer. This structure is a
complicated beast which will take a while to describe. A good starting
point is to note that there are three fundamental states that a buffer can
be in:
- In the driver's incoming queue. Buffers are placed in this queue by
the application in the expectation that the driver will do something
useful with them. For a video capture device, buffers in the incoming
queue will be empty, waiting for the driver to fill them with video
data. For an output device, these buffers will have frame data to be
sent to the device.
- In the driver's outgoing queue. These buffers have been processed by
the driver and are waiting for the application to claim them. For
capture devices, outgoing buffers will have new frame data; for output
devices, these buffers are empty.
- In neither queue. In this state, the buffer is owned by user space
and will not normally be touched by the driver. This is the only time
that the application should do anything with the buffer. We'll call
this the "user space" state.
These states, and the operations which cause transitions between them, come
together as shown in the diagram below:
The actual v4l2_buffer structure looks like this:
struct v4l2_buffer
{
__u32 index;
enum v4l2_buf_type type;
__u32 bytesused;
__u32 flags;
enum v4l2_field field;
struct timeval timestamp;
struct v4l2_timecode timecode;
__u32 sequence;
/* memory location */
enum v4l2_memory memory;
union {
__u32 offset;
unsigned long userptr;
} m;
__u32 length;
__u32 input;
__u32 reserved;
};
The index field is a sequence number identifying the buffer; it is
only used with memory-mapped buffers. Like other objects which can be
enumerated in the V4L2 interface, memory-mapped buffers start with index 0
and go up sequentially from there. The type field describes the
type of the buffer, usually V4L2_BUF_TYPE_VIDEO_CAPTURE or
V4L2_BUF_TYPE_VIDEO_OUTPUT.
The size of the buffer is given by length, which is in bytes. The
size of the image data contained within the buffer is found in
bytesused; obviously bytesused <= length.
For capture devices, the driver will set bytesused; for output
devices the application must set this field.
field describes which field of an image is stored in the buffer;
fields were discussed in part 5a of this series.
The timestamp field, for input devices, tells when the frame was
captured. For output devices, the driver should not send the frame out
before the time found in this field; a timestamp of zero means "as
soon as possible." The driver will set timestamp to the time that
the first byte of the frame was transferred to the device - or as close to
that time as it can get. timecode can be used to hold a timecode value,
useful for video editing applications; see this
table for details on timecodes.
The driver maintains a incrementing count of frames passing through the
device; it stores the current sequence number in sequence as each
frame is transferred. For input devices, the application can watch this
field to detect dropped frames.
memory tells whether the buffer is memory-mapped or user-space.
For memory-mapped buffers, m.offset describes where the buffer is
to be found. The specification describes it as "the offset of the
buffer from the start of the device memory," but the truth of the
matter is that it is simply a magic cookie that the application can pass to
mmap() to specify which buffer is being mapped. For user-space
buffers, instead, m.userptr is the user-space address of the
buffer.
The input field can be used to quickly switch between inputs on a
capture device - assuming the device supports quick switching between
frames. The reserved field should be set to zero.
Finally, there are several flags defined:
- V4L2_BUF_FLAG_MAPPED indicates that the buffer
has been mapped into user space. It is only applicable to
memory-mapped buffers.
- V4L2_BUF_FLAG_QUEUED: the buffer is in the driver's incoming
queue.
- V4L2_BUF_FLAG_DONE: the buffer is in the driver's outgoing
queue.
- V4L2_BUF_FLAG_KEYFRAME: the buffer holds a key frame - useful
in compressed streams.
- V4L2_BUF_FLAG_PFRAME and V4L2_BUF_FLAG_BFRAME are
also used with compressed streams; they indicated predicted or
difference frames.
- V4L2_BUF_FLAG_TIMECODE: the timecode field is valid.
- V4L2_BUF_FLAG_INPUT: the input field is valid.
Buffer setup
Once a streaming application has performed its basic setup, it will turn to
the task of organizing its I/O buffers. The first step is to establish a
set of buffers with the VIDIOC_REQBUFS ioctl(), which is
turned by V4L2 into a call to the driver's vidioc_reqbufs()
method:
int (*vidioc_reqbufs) (struct file *file, void *private_data,
struct v4l2_requestbuffers *req);
Everything of interest will be in the v4l2_requestbuffers
structure, which looks like this:
struct v4l2_requestbuffers
{
__u32 count;
enum v4l2_buf_type type;
enum v4l2_memory memory;
__u32 reserved[2];
};
The type field describes the type of I/O to be done; it will
usually be either V4L2_BUF_TYPE_VIDEO_CAPTURE for a video
acquisition device or V4L2_BUF_TYPE_VIDEO_OUTPUT for an output
device. There are other types, but they are beyond the scope of this
article.
If the application wants to use memory-mapped buffers, it will set
memory to V4L2_MEMORY_MMAP and count to the
number of buffers it wants to use. If the driver does not support
memory-mapped buffers, it should return -EINVAL. Otherwise, it
should allocate the requested buffers internally and return zero. On
return, the application will expect the buffers to exist, so any part of
the task which could fail (memory allocation, for example) should be done
at this stage.
Note
that the driver is not required to allocate exactly the requested number of
buffers. In many cases there is a minimum number of buffers which makes
sense; if the application requests fewer than the minimum, it may actually
get more buffers than it asked for. In your editor's experience, for
example, the mplayer application will request two buffers, which
makes it susceptible to overruns (and thus lost frames) if things slow
down in user space. By enforcing a higher minimum buffer count (adjustable with a module
parameter), the cafe_ccic driver is able to make the streaming I/O path a
little more robust.
The count field should be set
to the number of buffers actually allocated before the method returns.
Setting count to zero is a way for the application to request that
all existing buffers be released. In this case, the driver must stop any
DMA operations before freeing the buffers or terrible things could happen.
It is also not possible to free buffers if they are current mapped into
user space.
If, instead, user-space buffers are to be used, the only fields which
matter are the buffer type and a value of
V4L2_MEMORY_USERPTR in the memory field. The application
need not specify the number of buffers that it intends to use; since the
allocation will be happening in user space, the driver need not care. If
the driver supports user-space buffers, it need only note that the
application will be using this feature and return zero; otherwise the usual
-EINVAL return is called for.
The VIDIOC_REQBUFS command is the only way for an application to
discover which types of streaming I/O buffer are supported by a given
driver.
Mapping buffers into user space
If user-space buffers are being used, the driver will not see any more
buffer-related calls until the application starts putting buffers on the
incoming queue. Memory-mapped buffers require more setup, though. The
application will typically step through each allocated buffer and map it
into its address space. The first stop is the VIDIOC_QUERYBUF
command, which becomes a call to the driver's vidioc_querybuf()
method:
int (*vidioc_querybuf)(struct file *file, void *private_data,
struct v4l2_buffer *buf);
On entry to this method, the only fields of buf which will be set
are type (which should be checked against the type specified when
the buffers were allocated) and index, which identifies the
specific buffer. The driver should make sure that index makes
sense and fill in the rest of the fields in buf. Typically
drivers store an array of v4l2_buffer structures internally, so
the core of a vidioc_querybuf() method is just a structure
assignment.
The only way for an application to access memory-mapped buffers is to map
them into their address space, so a vidioc_querybuf() call will
typically be followed by a call to the driver's mmap() method -
this method, remember, is stored in the fops field of the
video_device structure associated with this device. How the
driver handles mmap() will depend on just how the buffers are set
up in the kernel. If the buffer can be mapped up front with
remap_pfn_range() or remap_vmalloc_range(), that should
be done at this time. For buffers in kernel space, pages can also be
mapped individually at page-fault time by setting up a nopage()
method in the usual
way. A good discussion of handling mmap() can be found in Linux Device Drivers for those who need it.
When mmap() is called, the VMA structure passed in should have the
address of one of your buffers in the vm_pgoff field -
right-shifted by PAGE_SHIFT, of course. It should, in particular,
be the offset value that your driver returned in response to a
VIDIOC_QUERYBUF call. Please iterate through your list of buffers
and be sure that the incoming address matches one of them; video drivers
should not be a means by which hostile programs can map arbitrary regions
of memory.
The offset value you provide can be almost anything,
incidentally. Some drivers just return (index<<PAGE_SHIFT),
meaning that the incoming vm_pgoff field should just be the buffer
index. The one thing you should not do is store the actual
kernel-space address of the buffer in offset; leaking kernel
addresses into user space is never a good idea.
When user space maps a buffer, the driver should set the
V4L2_BUF_FLAG_MAPPED flag in the associated v4l2_buffer
structure. It must also set up open() and close() VMA
operations so that it can track the number of processes which have the
buffer mapped. As long as this buffer remains mapped somewhere, it cannot
be released back to the kernel. If the mapping count of one or more
buffers drops to zero, the driver should also stop any in-progress I/O, as
there will be no process which can make use of it.
Streaming I/O
So far we have looked at a lot of setup without the transfer of a single
frame. We're getting closer, but there is one more step which must happen
first. When the application obtains buffers with VIDIOC_REQBUFS,
those buffers are all in the user-space state; if they are user-space
buffers, they do not really even exist yet. Before the application can
start streaming I/O, it must put at least one buffer into the driver's
incoming queue; for an output device, of course, those buffers should also
be filled with valid frame data.
To enqueue a buffer, the application will issue a VIDIOC_QBUF
ioctl(), which the V4L2 maps into a call to the driver's
vidioc_qbuf() method:
int (*vidioc_qbuf) (struct file *file, void *private_data,
struct v4l2_buffer *buf);
For memory-mapped buffers, once again, only the type and
index fields of buf are valid. The driver can just
perform the obvious checks (type and index make sense,
the buffer is not already on one of the driver's queues, the buffer is
mapped, etc.), put the buffer on its incoming queue (setting the
V4L2_BUF_FLAG_QUEUED flag), and return.
User-space buffers can be more complicated at this point, because the
driver will have never seen this buffer before. When using this method,
applications are allowed to pass a different address every time they enqueue
a buffer, so the driver can do no setup ahead of time. If your driver is
bouncing frames through a kernel-space buffer, it need only make a note of
the user-space address provided by the application. If you are trying to
DMA the data directly into user-space, however, life is significantly more
challenging.
To ship data directly into user space, the driver must first fault in all
of the pages of the buffer and lock them into place;
get_user_pages() is the tool to use for this job. Note that this
function can perform significant amounts of memory allocation and disk I/O
- it could block for a long time. You will need to take care to ensure
that important driver functions do not stall while
get_user_pages(), which can block for long enough for many video
frames to go by, does its thing.
Then there is the matter of telling the device to transfer image data to
(or from) the user-space buffer. This buffer will not be contiguous in
physical memory - it will, instead, be broken up into a large number of
separate 4096-byte pages (on most architectures). Clearly, the device will
have to be able to do
scatter/gather DMA operations. If the device transfers full video frames
at once, it will need to accept a scatterlist which holds a great many
pages; a VGA-resolution image in a 16-bit format requires 150 pages. As
the image size grows, so will the size of the scatterlist. The V4L2
specification says:
If required by the hardware the driver swaps memory pages within
physical memory to create a continuous area of memory. This happens
transparently to the application in the virtual memory subsystem of
the kernel.
Your editor, however, is unwilling to recommend that driver writers attempt
this kind of deep virtual memory trickery. A more promising approach could
be to require user-space buffers to be located in hugetlb pages, but no
drivers do that now.
If your device transfers images in smaller pieces (a USB camera, for
example), direct DMA to user space may be easier to set up. In any case,
when faced with the challenges of supporting direct I/O to user-space
buffers, the driver writer should (1) be sure that it is worth the
trouble, given that applications tend to expect to use memory-mapped
buffers anyway, and (2) make use of the video-buf layer, which can
handle some of the pain for you.
Once streaming I/O starts, the driver will grab buffers from its incoming
queue, have the device perform the requested transfer, then move the buffer
to the outgoing queue. The buffer flags should be adjusted accordingly
when this transition happens; fields like the sequence number and time stamp
should also
be filled in at this time. Eventually the application will want to claim
buffers in the outgoing queue, returning them to the user-space state.
That is the job of VIDIOC_DQBUF, which becomes a call to:
int (*vidioc_dqbuf) (struct file *file, void *private_data,
struct v4l2_buffer *buf);
Here, the driver will remove the first buffer from the outgoing queue,
storing the relevant information in *buf. Normally, if the
outgoing queue is empty, this call should block until a buffer becomes
available. V4L2 drivers are expected to handle non-blocking I/O, though, so if the
video device has been opened with O_NONBLOCK, the driver should
return -EAGAIN in the empty-queue case. Needless to say, this
requirement also implies that the driver must support poll() for
streaming I/O.
The only remaining step is to actually tell the device to start performing
streaming I/O. The Video4Linux2 driver methods for this task are:
int (*vidioc_streamon) (struct file *file, void *private_data,
enum v4l2_buf_type type);
int (*vidioc_streamoff)(struct file *file, void *private_data,
enum v4l2_buf_type type);
The call to vidioc_streamon() should start the device after
checking that type makes sense. The driver can, if need be,
require that a certain number of buffers be in the incoming queue before
streaming can be started.
When the application is done it should generate a call to
vidioc_streamoff(), which must stop the device. The driver should
also remove all buffers from both the incoming and outgoing queues, leaving
them all in the user-space state. Of course, the driver must be prepared
for the application to simply close the device without stopping streaming
first.
(
Log in to post comments)