User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 development kernel is 2.6.32-rc8, released on November 19. "The way things are going, this will likely be the last -rc. I wish we had more people looking at the regression list, but at some point I'm just going to have to say 'ok, enough is enough'." Details may be found in the full changelog.

There have been no stable kernel updates in the last week.

Comments (none posted)

Quotes of the week

Broadly speaking, staging WiFi drivers come in two flavors: (a) old dried gum from under the cafeteria table (drivers with a future), and (b) fresh vomit from the hung-over kid in your math class (those without a future).
-- Dan Williams

One man's obfuscation is another man's abstraction.
-- Frank Ch. Eigler

Writing a Linux distribution is hard. There's a huge range of interconnected dependencies. It takes a long time to learn how everything fits together, and fixing things properly rather than adding device-specific hacks often requires rewriting a lot of code. I'm sure Google will figure it out in time, and I'm also sure that the majority of their work is going into their UI rather than the underlying infrastructure. But even so, don't expect that you'll be able install Chromium OS on a random piece of hardware and have it work as well as, say, Fedora in the near future.
-- Matthew Garrett

Comments (9 posted)

LogFS returns

By Jonathan Corbet
November 24, 2009
LogFS is a longstanding project by Jörn Engel to create a filesystem for contemporary solid-state storage devices; it was last covered here in May, 2007. Since then, LogFS has mostly disappeared from view. As of November 20, though, LogFS is back and, seemingly, ready for a mainline merge. Jörn says:

Logfs has been around a couple of times. Linus last word was "go and don't come back until all format changes are done". Or something along those lines at least. Format changes are done. And I don't even intend to break git-bisect for anyone crazy enough to use logfs for /.

Sufficiently crazy users seem to be relatively scarce so far. But having more options for upcoming hardware can only be a good thing; it will be interesting to see what results come out as people start to play with this new filesystem.

Comments (5 posted)

Snapshot merge for the device mapper

By Jonathan Corbet
November 24, 2009
Last week, LWN looked at the use of Btrfs snapshots to help system administrators recover from problematic upgrades. Btrfs is not the only snapshot mechanism in the kernel, though; the device mapper layer has had this capability for some time. What is missing from DM is the ability to restore the "origin" (main) device to an earlier state if need be. So the device mapper, in its current form, cannot be used to roll back an unfortunate upgrade without taking the system down and copying data.

That situation could change soon, possibly as early as 2.6.33. Mike Snitzer has posted patches for a snapshot-merge target for DM. This target, simply, merges a snapshot back to the origin device, restoring the state of that device to what it was when the snapshot was taken. So a system administrator could snapshot the device immediately prior to an upgrade, then get back to the pre-upgrade state if things do not go well.

One nice feature is that merging a snapshot preserves the state of all other snapshots on the device. So our system administrator could take another snapshot after the failed upgrade, before returning to the previous state. That post-upgrade snapshot would continue to exist, allowing the cherry-picking of any files with changes that should persist after the system as a whole is rolled back.

DM maintainer Alasdair Kergon has told your editor that he'll be reviewing this code shortly, and that it may find its way into linux-next in the near future.

Comments (6 posted)

Help wanted: kbuild maintainer

Sam Ravnborg, long-time maintainer of the kernel build (kbuild) subsystem, has announced his intention to step down from that role. "I have done this solely on a hobbyist basis and family (3 kids etc) + job require me so the kbuild maintainer job was becoming a duty and not that fun suddenly." It's not clear who the replacement will be. Thanks are due to Sam, who has left the state of kernel building far better than he found it.

Full Story (comments: 1)

Kernel development news

Who wrote 2.6.32

By Jonathan Corbet
November 24, 2009
As of this writing, the 2.6.32 appears poised for a release right around the beginning of December. That can only mean that the time has come to look at the code which has gone into this kernel and where it came from. It has been another active cycle, with a lot of changes making it into the mainline.

In particular, as of this writing (shortly after the 2.6.32-rc8 release), 2.6.32 is the result of 10,767 non-merge changesets sent in by 1,229 developers. This changes added a total of 1.17 million lines, while removing 611,000 lines, for a net growth of 559,000 lines of code. According to Rafael Wysocki's regression reports, this development cycle introduced a total of 86 regressions into the kernel - slightly fewer than we saw for 2.6.31. As of that posting, the number of unresolved regressions was shrinking quickly, with 25 of them still without a resolution.

So who added all those regressions lines of code? The statistics for this cycle look like this:

Most active 2.6.32 developers
By changesets
Greg Kroah-Hartman2021.9%
Johannes Berg1801.7%
Bartlomiej Zolnierkiewicz1641.5%
Mark Brown1541.4%
Paul Mundt1391.3%
Takashi Iwai1391.3%
Alan Cox1291.2%
Roel Kluin1151.1%
Luis R. Rodriguez1051.0%
Dan Williams860.8%
Tejun Heo840.8%
Herbert Xu810.8%
Peter Zijlstra800.7%
Ingo Molnar770.7%
Julia Lawall770.7%
Steven Rostedt730.7%
Magnus Damm720.7%
Joe Perches710.7%
Joerg Roedel700.7%
By changed lines
Greg Kroah-Hartman17442711.5%
Bartlomiej Zolnierkiewicz1080567.1%
Mauro Carvalho Chehab627195.2%
Jing Huang491893.2%
Forest Bond450093.0%
Ben Hutchings374182.5%
Eilon Greenstein280081.8%
Mark Brown245161.6%
Brian Swetland227751.5%
Hank Janssen196811.3%
Leo Chen174581.2%
Palash Bandyopadhyay167901.1%
Alan Cox164661.1%
Mithlesh Thukral151731.0%
Jerome Glisse143430.9%
Michael Chan134150.9%
Martyn Welch124800.8%
Iliyan Malchev121720.8%
Jesse Brandeburg110510.7%

As has become traditional, Greg Kroah-Hartman and Bartlomiej Zolnierkiewicz feature at the top of both lists. Much of Greg's work had to do with the cleaning up of Microsoft's "hv" drivers. His state of mind during this process is best assessed from the commit messages, which tend to read like this one:

The Linux kernel doesn't have all caps structures, we don't like to shout at our programmers, it makes them grumpy. Instead, we like to sooth them with small, rounded letters, which puts them in a nice, compliant mood, and makes them more productive and happier, allowing them more fufilling lives overall.

Greg also removed some drivers from the staging tree, shrinking the kernel by over 100,000 lines.

The bulk of Bartlomiej's work is also in the staging tree, and that is mostly concerned with fixing up a series of rather unloved wireless network drivers. These patches are somewhat controversial; the wireless developers would rather see that effort going into a different set of non-staging drivers. But those drivers are not yet ready for prime time, and, meanwhile, people are using the staging drivers. Wireless drivers were also the focus of Johannes Berg's work; he has made a long set of improvements to the mac80211 subsystem and its cfg80211 configuration interface. Mark Brown continues to contribute large amounts of code in support of Wolfson Micro's components, and Paul Mundt remains active as the Super-H maintainer.

In the "lines changed" column, Mauro Carvalho Chehab contributed a lot of patches as the Video4Linux2 maintainer. Jing Huang contributed the Brocade BFA FC SCSI driver, and Forest Bond added the VT6656 wireless driver to the staging tree.

Developers working on 2.6.32 were supported by (at least) 196 employers. The most active companies this time around are:

Most active 2.6.32 employers
By changesets
Red Hat10289.5%
Renesas Technology2642.5%
Atheros Communications1971.8%
Texas Instruments1551.4%
Wolfson Micro1531.4%
Analog Devices1241.2%
By lines changed
Red Hat1507819.9%
Logic Supply451653.0%
Wolfson Micro255771.7%
Texas Instruments248241.6%
Renesas Technology245071.6%
LinSysSoft Technologies151731.0%
GE Fanuc124950.8%

The sharp-eyed reader will notice that Red Hat has fallen below 10% of the total changes - the first time that has happened since the 2.6.21 development cycle in early 2007. The number of changes from Red Hat this time around is only slightly lower than the usual, though; what's happening is that some of the other companies are catching up.

There are a couple of other interesting entries here. Google takes a lot of grief for not contributing back, but that company was the source of a fair amount of code going into 2.6.32. Much of that was support for the HTC "Dream" (aka G1 or ADP1) phone platform, but Google also contributed to control groups, ext4, memory management, IPVS, and libata. And one may have never expected to see Microsoft show up on the list of top kernel contributors, but the hv drivers put it there for 2.6.32.

The numbers for signoffs have not changed much from previous cycles:

Top non-author signoffs in 2.6.32
David S. Miller99610.2%
John W. Linville99410.2%
Greg Kroah-Hartman7888.1%
Andrew Morton7868.1%
Ingo Molnar5015.1%
Mauro Carvalho Chehab3984.1%
James Bottomley3103.2%
Len Brown1881.9%
Paul Mundt1711.8%
Russell King1651.7%
Red Hat360637.1%
Renesas Technology1801.9%
Wolfson Micro1551.6%

If anything, the subsystem maintainers are concentrating even more than before. Fully 2/3 of the patches going into the mainline kernel pass through the hands of developers working for just four companies.

At the 2009 Kernel Summit, the participants concluded that, while improvements can always be made, the process as a whole is working well. The picture that comes from these numbers suggests the same conclusion: the kernel development machine continues to absorb massive numbers of changes from a wide development community while continuing to produce stable, increasingly functional releases.

Comments (20 posted)

Journal-guided RAID resync

By Jonathan Corbet
November 24, 2009
The RAID4, 5, and 6 storage technologies are designed to protect against the failure of a single drive. Blocks of data are spread out across the array and, for each stripe, there is a parity block stored on one of the drives. Should one drive fail, the lost data can be recovered through the use of the remaining drives and the parity information. This mechanism copes less well with system crashes and power failures, though, forcing software RAID administrators to choose between speed and reliability. A new mechanism called journal-guided resyncronization may make life easier, but only if it actually gets into the kernel.

The problem is that data and parity blocks must be updated in an atomic manner; if the two go out of sync, then the RAID array is no longer in a position to recover lost data. Indeed, it could return corrupted data. Expensive hardware RAID solutions use battery backup to ensure that updates are not interrupted partway through, but software RAID solutions often do not have that option. So if the system crashes - or the power fails - in the middle of an update to a RAID volume, that volume could end up being corrupted. Computer users, being a short-sighted kind of people in general, tend to regard this as a Bad Thing.

There are a couple of possible ways of mitigating this risk. One is to perform a full rescan of the RAID volume after a crash, fixing up any partially-updated stripes. The problem here is that (1) the correct fix for an inconsistent stripe may not always be clear, and (2) this process can take a long time. Long enough to cause users to think nostalgically about the days of fast, reliable floppy-disk storage.

An alternative approach is to introduce a type of journaling to the RAID layer. The RAID implementation can set aside some storage where it writes stripes (perhaps not the data, but, perhaps, just the numbers of the affected stripes) prior to changing the real array. This approach works, and it can recover a crashed RAID array without a full rescan, but there is a cost here too: that journaling can slow down the operation of the array significantly. Writes to the journal must be synchronous or it cannot be counted on to do its job, so write operations become far slower than they were before. Given that, it's not surprising that a lot of RAID administrators turn off RAID-level journaling and spend a lot of time hoping that nothing goes wrong.

A few years ago, Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau published a paper describing a better way, which they called "journal-guided resynchronization." Contemporary filesystems tend to do journaling of their own; why not use the filesystem journal to track changes to the RAID array as well? Running one journal can only be cheaper than running two - especially when one considers that the RAID journal must track, among other things, changes to the filesystem journal. The only problem is that the RAID and filesystem layers communicate through the relatively narrow block-layer API; using filesystem journaling to track RAID-level information has the potential to mix the layers considerably.

Jody McIntyre's journal-guided resync implementation adds a new "declared" mode to the ext3 filesystem. As the journal is being written, a new "declare block" is added describing exactly which blocks are to be written to the storage device. Those blocks are then written with a new BIO flag stating that the filesystem has taken responsibility for resynchronizing the stripe should something go wrong; that lets the storage layer forget about that particular problem. Should the system crash, the filesystem will find those declare blocks in the journal; it can then issue a (new) BIO_SYNCRAID operation asking the storage subsystem to resynchronize the specific stripes containing the listed blocks.

The result should be the best of both worlds. The cost of adding one more block to the filesystem journal is far less than doing that journaling at the RAID layer; Jody claims a 3-5% performance hit, as compared to 30% with the MD write-intent bitmap mechanism. But resynchronization after a crash should be quite fast, since it need only look at the parts of the array which were under active modification at the time. The only problem is that it requires the addition of specific support at the filesystem layer, so each filesystem must be modified separately. How this technique could be used in a filesystem which works without journaling (Btrfs comes to mind) would also have to be worked out.

There's one other little problem as well. This work was done at Sun as a way of improving performance with the Lustre filesystem. But Jody notes:

Unfortunately, we have determined that these patches are NOT useful to Lustre. Therefore I will not be doing any more work on them. I am sending them now in case they are useful as a starting point for someone else's work.

So this patch series has been abandoned for now. It seems like this functionality should be useful to software RAID users, so, hopefully, somebody will pick them up and carry them forward. In the absence of a new developer, software RAID administrators will continue to face an unhappy choice well into the future.

Comments (22 posted)

Videobuf: buffer management for V4L2 drivers

By Jonathan Corbet
November 23, 2009
Video4Linux2 (V4L2) drivers provide access to webcams, TV tuners, and TV output devices, among others. LWN covered much of the V4L2 API in 2007; sadly, like almost any two-year-old kernel documentation, those articles are now somewhat obsolete. One thing that has not changed, though, is that V4L2 drivers tend to be moderately complex beasts; they are usually an assembly of two or three drivers working together to operate hardware with a number of complex operating modes. Despite all that, a V4L2 driver has, at its core, a relatively simple task: fill large buffers in memory with video frames and transfer them between the device and user space. The management of these buffers, while subject to complexities of its own, tends to be quite similar from one driver to the next. It would be nice if there were a support layer which could be used to handle much of this task in a standard way.

The good news is that such a layer does exist; it's called videobuf. The bad news is that the documentation for this code is...not quite what it could be. This article is an attempt to fill that gap; a version of it will eventually be submitted for inclusion into the kernel documentation directory.

The videobuf layer functions as a sort of glue layer between a V4L2 driver and user space. It handles the allocation and management of buffers for the storage of video frames. There is a set of functions which can be used to implement many of the standard POSIX I/O system calls, including read(), poll(), and, happily, mmap(). Another set of functions can be used to implement the bulk of the V4L2 ioctl() calls related to streaming I/O, including buffer allocation, queueing and dequeueing, and streaming control. Using videobuf imposes a few design decisions on the driver author, but the payback comes in the form of reduced code in the driver and a consistent implementation of the V4L2 user-space API.

Buffer types

Not all video devices use the same kind of buffers. In fact, there are (at least) three common variations:

  • Buffers which are scattered in both the physical and (kernel) virtual address spaces. All user-space buffers are like this, but it makes great sense to allocate kernel-space buffers this way as well when it is possible. Unfortunately, it is not always possible; working with this kind of buffer normally requires hardware which can do scatter/gather DMA operations.

  • Buffers which are physically scattered, but which are virtually contiguous; buffers allocated with vmalloc(), in other words. These buffers are just as hard to use for DMA operations, but they can be useful in situations where DMA is not available but virtually-contiguous buffers are convenient.

  • Buffers which are physically contiguous. Allocation of this kind of buffer can be unreliable on fragmented systems, but simpler DMA controllers cannot deal with anything else.

Videobuf can work with all three types of buffers, but the driver author must pick one at the outset and design the driver around that decision.

Data structures, callbacks, and initialization

Depending on which type of buffers are being used, the driver should include one of the following files:


The driver's data structure describing a V4L2 device should include a struct videobuf_queue instance for the management of the buffer queue, along with a list_head for the queue of available buffers. There will also need to be an interrupt-safe spinlock which is used to protect (at least) the queue.

The next step is to write four simple callbacks to help videobuf deal with the management of buffers:

    struct videobuf_queue_ops {
	int (*buf_setup)(struct videobuf_queue *q,
			 unsigned int *count, unsigned int *size);
	int (*buf_prepare)(struct videobuf_queue *q,
			   struct videobuf_buffer *vb,
			   enum v4l2_field field);
	void (*buf_queue)(struct videobuf_queue *q,
			  struct videobuf_buffer *vb);
	void (*buf_release)(struct videobuf_queue *q,
			    struct videobuf_buffer *vb);

buf_setup() is called early in the I/O process, when streaming is being initiated; its purpose is to tell videobuf about the I/O stream. The count parameter will be a suggested number of buffers to use; the driver should check it for rationality and adjust it if need be. As a practical rule, a minimum of two buffers are needed for proper streaming, and there is usually a maximum (which cannot exceed 32) which makes sense for each device. The size parameter should be set to the expected (maximum) size for each frame of data.

Each buffer (in the form of a struct videobuf_buffer pointer) will be passed to buf_prepare(), which should set the buffer's size, width, height, and field fields properly. If the buffer's state field is VIDEOBUF_NEEDS_INIT, the driver should pass it to:

    int videobuf_iolock(struct videobuf_queue* q, struct videobuf_buffer *vb,
			struct v4l2_framebuffer *fbuf);

Among other things, this call will usually allocate memory for the buffer. Finally, the buf_prepare() function should set the buffer's state to VIDEOBUF_PREPARED.

When a buffer is queued for I/O, it is passed to buf_queue(), which should put it onto the driver's list of available buffers and set its state to VIDEOBUF_QUEUED. Note that this function is called with the queue spinlock held; if it tries to acquire it as well things will come to a screeching halt. Yes, this is the voice of experience. Note also that videobuf may wait on the first buffer in the queue; placing other buffers in front of it could again gum up the works. So use list_add_tail() to enqueue buffers.

Finally, buf_release() is called when a buffer is no longer intended to be used. The driver should ensure that there is no I/O active on the buffer, then pass it to the appropriate free routine(s):

    /* Scatter/gather drivers */
    int videobuf_dma_unmap(struct videobuf_queue *q,
    			   struct videobuf_dmabuf *dma);
    int videobuf_dma_free(struct videobuf_dmabuf *dma);

    /* vmalloc drivers */
    void videobuf_vmalloc_free (struct videobuf_buffer *buf);

    /* Contiguous drivers */
    void videobuf_dma_contig_free(struct videobuf_queue *q,
			          struct videobuf_buffer *buf);

One way to ensure that a buffer is no longer under I/O is to pass it to:

    int videobuf_waiton(struct videobuf_buffer *vb, int non_blocking, int intr);

Here, vb is the buffer, non_blocking indicates whether non-blocking I/O should be used (it should be zero in the buf_release() case), and intr controls whether an interruptible wait is used.

File operations

At this point, much of the work is done; much of the rest is slipping videobuf calls into the implementation of the other driver callbacks. The first step is in the open() function, which must initialize the videobuf queue. The function to use depends on the type of buffer used:

    void videobuf_queue_sg_init(struct videobuf_queue *q,
			        struct videobuf_queue_ops *ops,
			 	struct device *dev,
			 	spinlock_t *irqlock,
			 	enum v4l2_buf_type type,
			 	enum v4l2_field field,
			 	unsigned int msize,
			 	void *priv);

    void videobuf_queue_vmalloc_init(struct videobuf_queue *q,
			        struct videobuf_queue_ops *ops,
			 	void *dev,
			 	spinlock_t *irqlock,
			 	enum v4l2_buf_type type,
			 	enum v4l2_field field,
			 	unsigned int msize,
				void *priv);

    void videobuf_queue_dma_contig_init(struct videobuf_queue *q,
				       struct videobuf_queue_ops *ops,
				       struct device *dev,
				       spinlock_t *irqlock,
				       enum v4l2_buf_type type,
				       enum v4l2_field field,
				       unsigned int msize,
				       void *priv);

In each case, the parameters are the same: q is the queue structure for the device, ops is the set of callbacks as described above, dev is the device structure for this video device, irqlock is an interrupt-safe spinlock to protect access to the data structures, type is the buffer type used by the device (cameras will use V4L2_BUF_TYPE_VIDEO_CAPTURE, for example), field describes which field is being captured (often V4L2_FIELD_NONE for progressive devices), msize is the size of any containing structure used around struct videobuf_buffer, and priv is a private data pointer which shows up in the priv_data field of struct videobuf_queue. Note that these are void functions which, evidently, are immune to failure.

The void *dev typing in videobuf_queue_vmalloc_init() is a bit of an anomaly; your editor has submitted a patch to change it to struct device *. The ops pointer also should really be const; that will probably change in 2.6.33.

V4L2 capture drivers can be written to support either of two APIs: the read() system call and the rather more complicated streaming mechanism. As a general rule, it is necessary to support both to ensure that all applications have a chance of working with the device. Videobuf makes it easy to do that with the same code. To implement read(), the driver need only make a call to one of:

    ssize_t videobuf_read_one(struct videobuf_queue *q,
			      char __user *data, size_t count, 
			      loff_t *ppos, int nonblocking);

    ssize_t videobuf_read_stream(struct videobuf_queue *q,
			         char __user *data, size_t count, 
				 loff_t *ppos, int vbihack, int nonblocking);

Either one of these functions will read frame data into data, returning the amount actually read; the difference is that videobuf_read_one() will only read a single frame, while videobuf_read_stream() will read multiple frames if they are needed to satisfy the count requested by the application. A typical driver read() implementation will start the capture engine, call one of the above functions, then stop the engine before returning (though a smarter implementation might leave the engine running for a little while in anticipation of another read() call happening in the near future).

The poll() function can usually be implemented with a direct call to:

    unsigned int videobuf_poll_stream(struct file *file,
				      struct videobuf_queue *q,
				      poll_table *wait);

Note that the actual wait queue eventually used will be the one associated with the first available buffer.

When streaming I/O is done to kernel-space buffers, the driver must support the mmap() system call to enable user space to access the data. In many V4L2 drivers, the often-complex mmap() implementation simplifies to a single call to:

    int videobuf_mmap_mapper(struct videobuf_queue *q,
			     struct vm_area_struct *vma);

Everything else is handled by the videobuf code.

The release() function requires two separate videobuf calls:

    void videobuf_stop(struct videobuf_queue *q);
    int videobuf_mmap_free(struct videobuf_queue *q);

The call to videobuf_stop() terminates any I/O in progress - though it is still up to the driver to stop the capture engine. The call to videobuf_mmap_free() will ensure that all buffers have been unmapped; if so, they will all be passed to the buf_release() callback. If buffers remain mapped, videobuf_mmap_free() returns an error code instead. The purpose is clearly to cause the closing of the file descriptor to fail if buffers are still mapped, but every driver in the 2.6.32 kernel cheerfully ignores its return value.

ioctl() operations

The V4L2 API includes a very long list of driver callbacks to respond to the many ioctl() commands made available to user space. A number of these - those associated with streaming I/O - turn almost directly into videobuf calls. The relevant helper functions are:

    int videobuf_reqbufs(struct videobuf_queue *q,
		         struct v4l2_requestbuffers *req);
    int videobuf_querybuf(struct videobuf_queue *q, struct v4l2_buffer *b);
    int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b);
    int videobuf_dqbuf(struct videobuf_queue *q, struct v4l2_buffer *b, 
                       int nonblocking);
    int videobuf_streamon(struct videobuf_queue *q);
    int videobuf_streamoff(struct videobuf_queue *q);
    int videobuf_cgmbuf(struct videobuf_queue *q, struct video_mbuf *mbuf, 
    			int count);

So, for example, a VIDIOC_REQBUFS call turns into a call to the driver's vidioc_reqbufs() callback which, in turn, usually only needs to locate the proper struct videobuf_queue pointer and pass it to videobuf_reqbufs(). These support functions can replace a great deal of buffer management boilerplate in a lot of V4L2 drivers.

The vidioc_streamon() and vidioc_streamoff() functions will be a bit more complex, of course, since they will also need to deal with starting and stopping the capture engine. videobuf_cgmbuf(), called from the driver's vidiocgmbuf() function, only exists if the V4L1 compatibility module has been selected with CONFIG_VIDEO_V4L1_COMPAT, so its use must be surrounded with #ifdef directives.

Buffer allocation

Thus far, we have talked about buffers, but have not looked at how they are allocated. The scatter/gather case is the most complex on this front. For allocation, the driver can leave buffer allocation entirely up to the videobuf layer; in this case, buffers will be allocated as anonymous user-space pages and will be very scattered indeed. If the application is using user-space buffers, no allocation is needed; the videobuf layer will take care of calling get_user_pages() and filling in the scatterlist array.

If the driver needs to do its own memory allocation, it should be done in the vidioc_reqbufs() function, after calling videobuf_reqbufs(). The first step is a call to:

    struct videobuf_dmabuf *videobuf_to_dma(struct videobuf_buffer *buf);

The returned videobuf_dmabuf structure (defined in <media/videobuf-dma-sg.h>) includes a couple of relevant fields:

    struct scatterlist  *sglist;
    int                 sglen;

The driver must allocate an appropriately-sized scatterlist array and populate it with pointers to the pieces of the allocated buffer; sglen should be set to the length of the array.

Drivers using the vmalloc() method need not (and cannot) concern themselves with buffer allocation at all; videobuf will handle those details. The same is true of contiguous-DMA drivers; videobuf will allocate the buffers (with dma_alloc_coherent()) when it sees fit. That means that these drivers may be trying to do high-order allocations at any time, an operation which is not always guaranteed to work. Some drivers play tricks by allocating DMA space at system boot time; videobuf does not currently play well with those drivers.

Filling the buffers

The final part of a videobuf implementation has no direct callback - its the portion of the code which actually puts frame data into the buffers, usually in response to interrupts from the device. For all types of drivers, this process works approximately as follows:

  1. Obtain the next available buffer and make sure that somebody is actually waiting for it.

  2. Get a pointer to the memory and put video data there.

  3. Mark the buffer as done and wake up the process waiting for it.

Step (1) above is done by looking at the driver-managed list_head structure - the one which is filled in the buf_queue() callback. Because starting the engine and enqueueing buffers are done in separate steps, it's possible for the engine to be running without any buffers available - in the vmalloc() case especially. So the driver should be prepared for the list to be empty. It is equally possible that nobody is yet interested in the buffer; the driver should not remove it from the list or fill it until a process is waiting on it. That test can be done by examining the buffer's done field (a wait_queue_head_t structure) with waitqueue_active().

For scatter/gather drivers, the needed memory pointers will be found in the scatterlist structure described above. Drivers using the vmalloc() method can get a memory pointer with:

    void *videobuf_to_vmalloc(struct videobuf_buffer *buf);

For contiguous DMA drivers, the function to use is:

    dma_addr_t videobuf_to_dma_contig(struct videobuf_buffer *buf);

The contiguous DMA API goes out of its way to hide the kernel-space address of the DMA buffer from drivers.

The final step is to set the size field of the relevant videobuf_buffer structure to the actual size of the captured image, set state to VIDEOBUF_DONE, then call wake_up() on the done queue. At this point, the buffer is owned by the videobuf layer and the driver should not touch it again.


This article has covered most aspects of the videobuf API. Developers who are interested in more information can go into the relevant header files; there are a few low-level functions declared there which have not been talked about here. Also worthwhile is the vivi driver (drivers/media/video/vivi.c), which is maintained as an example of how V4L2 drivers should be written. Vivi only uses the vmalloc() API, but it's good enough to get started with. Note also that all of these calls are exported GPL-only, so they will not be available to non-GPL kernel modules.

Comments (2 posted)

Patches and updates

Kernel trees


Build system

  • nconfig v7 . (November 25, 2009)

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management



Virtualization and containers

Benchmarks and bugs


Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds