|
|
Log in / Subscribe / Register

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch is 2.6.23-rc5, released by Linus on August 31, immediately prior to his departure for the kernel summit. It contains a fair number of fixes; this kernel is stabilizing but has some ground yet to cover before it is ready for release.

There have been a very small number of fixes added to the mainline git repository since the -rc5 release.

The current -mm tree is 2.6.23-rc4-mm1. Recent changes to -mm include some significant internal sysfs implementation changes, some filesystem API changes, the sysctl() re-deprecation patches, and the container memory controller patches.

The current stable 2.6 kernel is 2.6.22.6, released with a couple dozen fixes on August 30.

Comments (none posted)

Kernel development news

Quotes of the week

If we're going to send a message to sysadmins, we shouldn't force them to go through a git bisection search and a lkml discussion to receive it!
-- Andrew Morton

Judging by the number and severity of the bug reports which seem to be flying past, 2.6.23 isn't exactly imminent.
-- Andrew Morton

Comments (none posted)

LinuxConf.eu: Documentation and user-space API design

By Jonathan Corbet
September 3, 2007
[Michael Kerrisk] Michael Kerrisk, the Linux man page maintainer since 2004, gave a talk on the value of documentation during the first day of LinuxConf Europe 2007. While documents are useful for end users trying to get their job done, this use was not Michael's focus; instead, he talked about how documentation can help in the creation of a better kernel in the first place. The writing of documents, he says, reveals bugs and bad interface designs before they become part of a released kernel. And that can help to prevent a great deal of pain for both kernel and user-space developers.

Michael presented three examples to show how the process of writing documentation can turn up bugs:

  • The inotify interface was added to the 2.6.13 kernel as an improved way for an application to request notifications when changes are made to directories and files. Around 2.6.16, Michael got around to writing a manual page for this call, only to find that one option (IN_ONESHOT) had never worked. Once the problem was found it was quickly fixed, but that did not happen until an effort was made to document the interface.

  • splice() was added in 2.6.17. Michael found that it was easy to write programs which would go into an unkillable hang; clogging the system with hung processes was also easy. Again, once the problem was found, it was fixed quickly.

  • The timerfd() interface, as merged in 2.6.22, did not work properly. It also has some design issues, as were covered in this article.

The existence of buggy interfaces in stable kernel releases is, says Michael, a result of insufficient testing of -rc kernels during the development process. Better documentation can help with this problem. Better documentation can also help with the API design process in the first place. Designing good APIs is hard, and is made harder by the fact that, for the kernel, API design mistakes must be maintained forever. So anything which can help in the creation of a good API can only be a good thing.

The characteristics of a good API include simplicity, ease of use, generality, consistency with other interfaces, and integration with other interfaces. Bad designs, instead, lack those characteristics. As an example, Michael discussed the dnotify interface - the previous attempt to provide a file-change notification service. Dnotify suffered as a result of its use of signals, which never leads to an easy-to-use interface. It was only able to monitor directories, not individual files. It required keeping an open file descriptor, thus preventing the unmounting of any filesystem where dnotify was in use. And the amount of information provided to applications was limited.

Another example was made of the mlock() and remap_file_pages() system calls. Both have start and length arguments to specify the range of memory to be affected. The mlock() interface rounds the length argument up to the next page, while remap_file_pages() rounds it down. The two system calls also differ in when they apply the length argument. As a result, a call like:

    mlock (4000, 6000);

will affect bytes 0..12287, while

    remap_file_pages (4000, 6000, ...);

affects bytes 0..4095. This sort of inconsistency makes these system calls harder for developers to use.

Many bits can be expended on how bad these interfaces are. But, asks Michael, was it all really the developer's fault? Or did the lack of a review process contribute to these problems?

Many of these difficulties result from the fact that the designers of system call interfaces (kernel hackers) are not generally the users of those interfaces. To make things better, Michael put forward a proposal to formalize the system call interface development process. He acknowledges that this sort of formalization is a hard sell, but the need to create excellent interfaces from the first release makes it necessary. So he would like to see a formal signoff requirement for APIs - though who would be signing off on them was not specified. There would need to be a design review, full documentation of the interface, and a test suite before this signoff could happen. The test suite would need to be at least partially written by people other than the developer, who will never be able to imagine all of the crazy things users might try to do with a new interface.

The documentation requirement is an important part of the process. Writing documentation for an interface will often reveal bugs or bad design decisions. Beyond that, good documentation makes the nature of the interface easier for others to understand, resulting in more review and more testing of a proposed interface. Without testing from application developers, problems in new APIs will often not be found until after they have been made part of a stable kernel release, and that is too late.

In the question period, it was asserted that getting application developers to try out system calls in -rc kernels is always going to be hard. An alternative idea, which has been heard before, would be to mark new system calls as "experimental" for a small number of kernel release cycles after they are first added. Then it would be possible to try out new system calls without having to run development kernels and still have a chance to influence the final form of the new API. It might be easier to get the kernel developers to agree to this kind of policy than to get them to agree to an elaborate formal review process, but it still represents a policy change which would have to be discussed. That discussion could happen soon; how it goes will depend on just how many developers really feel that there is a problem with how user-space APIs are designed and deployed now.

[Arnd Bergmann] The next day, Arnd Bergmann gave a talk on how not to design kernel interfaces. Good interfaces, he says, are designed with "taste," but deciding what has taste is not always easy. Taste is subjective and changes over time. But some characteristics of a tasteful interface are clear: simplicity, consistency, and using the right tool for the job. These are, of course, very similar to the themes raised by Michael the day before.

As is often the case, discussion of interface design is often most easily done by pointing out the things one should not do. Arnd started in with system calls, which are the primary interface to the kernel. Adding new system calls is a hard thing to do; there is a lot of review which must be gotten through first (though, as discussed above, perhaps it's still not hard enough). But often the alternative to adding system calls can be worse; he raised the hypothetical idea of a /dev/exit device; a process which has completed its work could quit by opening and writing to that device. Such a scheme would allow the elimination of the exit() system call, but it would not be a more tasteful interface by any means.

The ioctl() system call has long been the target of criticism; it is not type safe, hard to script, and is an easy way to sneak in ABI changes without anybody noticing. On the other hand, it is well established, easy to extend, it works in modules, and it can be a good way to prototype system calls. Again, trying to avoid ioctl() can lead to worse things; Arnd presented an example from the InfiniBand code which interprets data written to a special file descriptor to execute commands. The result is essentially ioctl(), but even less clear.

Sockets are a well-established interface which, Arnd says, would never be accepted into the kernel now. They are totally inconsistent with everything else, operate on devices which are not part of the device tree, have read and write calls which are not read() and write(), and so on. Netlink, by adding complexity to the socket interface, did not really help the user-space interface situation in general; its use is, he says, best avoided. But, importantly, it is better to use netlink than to reinvent it. The wireless extensions API was brought up as another example of how not to do things; putting wireless extensions over netlink turned out to be a way of combining the worst features of sockets and ioctl() into a single interface.

The "fashionable" way to design new interfaces now is with virtual filesystems. But troubles can be found there as well. /proc became a sort of dumping ground for new interfaces until the developers began to frown on additions there. Sysfs was meant to solve many of the problems with /proc, but it clearly has not solved the API stability problem. Virtual filesystems may well be the best way to create new interfaces, but there are many traps there.

Finally, there was some talk of designing interfaces to make ABI emulation easy. Arnd suggests that data structures should be the same in both kernel and user space. Avoid long variables, and, whenever possible, avoid pointers as well. Structure padding - either explicit or caused by badly aligned fields - can lead to trouble. And so on.

All told, it was a lively session with a great deal of audience participation. There are many user-space interface design mistakes which are part of Linux and must be supported forever. There is also a great deal of interest in avoiding making more of those mistakes in the future. The problem remains a hard one, though, even with the benefit of a great deal of experience.

Comments (52 posted)

The many faces of fsck

September 5, 2007

This article was contributed by Valerie Aurora

When people talk about fsck they not only pronounce it in wildly different ways, but they also mean wildly different actions. For example, they might mean "traverse the entire file system looking for obvious errors," "run a full consistency cross-check of file system metadata," "repair corruption from a disk error," "repair half-finished writes leftover from a system crash," "reconstruct a consistent file system hierarchy starting from the inodes alone," or "I'm so geeky I think it's funny to say 'fsck' instead of swearing. Is there a new xkcd up yet?" As different as all these meanings are, every one of them (except the last) has been implemented by a program referred to as fsck. The question, "Does this file system require fsck?" then becomes anything from "Does this file system need to check and repair the entire file system after every crash before mounting read-write?" to "Can this file system recover from any disk corruption event while still mounted?" In this article, we'll review the history and the various meanings of that complicated, least-beloved of file system utilities, fsck.

fsck tasks

First, what exactly does fsck - the "file system check" program - do? Many Linux users experience it as that annoying 10 minute delay in booting that happens every 180 days or 30 mounts, whichever comes first (the default ext3 "paranoia" fsck parameters). When we do run fsck, most of us run it in automatic mode. After all, how many of us can out-guess fsck when it comes to repairing internal file system structures? Probably the top 10 developers for each file system, which leaves the other 99.99% of us with the -y switch. But before we can understand the differences between fsck implementations, we have to have some idea of what it does.

The most important job of fsck is to find out whether the file system makes a consistent, correctly formatted whole. This is not as simple as traversing all of the file system and incidentally making sure the metadata is good enough for reading along the way. fsck also has to do more involved cross-checks on the metadata than simply reading it, and make sure that the parts of the file system it believes are unused are in fact unused. This is the difference between having a file system that is consistent enough to read, and one that is consistent enough to write. A file system that can be read may be chock-full of reference count bugs and errors which will only cause trouble when the system attempts to actually change the file system. A car may be in good enough repair to start and idle, but then fall apart once it leaves the garage.

During consistency checking, fsck double-checks the metadata describing which blocks and inodes are free, and which are allocated. Usually, some sort of allocation bitmap or tree of extents is maintained to speed up searching for free blocks or inodes - otherwise, the file system would have to check every file to see if it used a particular block, very slow going indeed. This bitmap is a distilled copy of the metadata in individual block pointers or inodes describing whether a block or inode is in use. The upside of this second copy is speed (or lack of glacial slowness, more accurately); the downside is possible inconsistency. If corruption occurs, the two copies can disagree with each other, leading to further file system corruption. The kinds of errors fsck looks for here are double-use (a block with more than one pointer to it), leaked inodes or blocks (an inode or block is marked as used but nothing refers to it), and disagreement (a block pointer points to a block or a directory entry points to an inode but it is marked as free).

Orphan inodes, inodes marked as allocated but not pointed to by any directory entry deserve extra discussion. Orphan inodes are surprisingly common, due to a UNIX convention that allows a file to be unlinked (removed from the directory tree) but still open. Many programs create temporary files and unlink them in this way so they are guaranteed to be deleted even if the program doesn't shut down properly. The file system has the honor of implementing this guarantee. Many modern file systems maintain some form of on-disk delete queue - a list of inodes which need to be deleted when their reference count drops - for quick deletion in case of crash, instead of searching the entire file system for orphan inodes. Even journaling file systems must kick-start this deletion after an unclean unmount, though it is not crucial to using the file system immediately.

Free/allocated consistency is particularly hard especially when it comes to blocks. Most file systems do not have any way to have back pointers for blocks to their parent, so the only way to find out if a block is really part of a file is to traverse the entire file system. Detecting duplicate block allocations requires keeping a block allocation bitmap and checking if a block is already marked before marking a block as allocated. Fixing the duplicate allocation requires keeping a list of which inode points to a block which can take a lot of memory; the ext2/3/4 fsck doesn't record this information until it detects a duplicate block, at which point it starts over and finds this information.

UNIX file systems have the wonderful quality of allowing more than one hard link to an inode (which can be file or directory). The inode is not deleted until all the hard links are gone. Each inode must maintain a link count, and fsck has to check that the number of directory entries referencing an inode is exactly the same as the link count. This is checked by walking the entire directory tree and recording each link to an inode.

The structure of the directories in a file system has to obey certain rules. No directory cycles can exist (e.g., directory A -> directory B -> directory A), and each directory must be reachable from the root directory of a file system.

The above are the most important, generic UNIX rules for file system consistency, but there are many more things to check. Each file system then also needs to check the internal structure of its metadata. For example, if the file system uses extents, the file system must check that the extents of a file are correctly formatted and refer to plausible blocks. The superblock and the summaries for groups of blocks must be checked. Some file systems use B-trees extensively and must check them for consistency too, and so forth.

One paper that may help with understanding some of the more subtle issues of file system checking is Fast Consistency Checking for the Solaris File System [PDF]. The authors implement a scheme for fast fsck with relatively minor changes to the Solaris UFS file system, in the process describing the most difficult tasks in file system consistency checking.

Primordial fsck: check the file system and repair in-progress updates

For the purposes of UNIX, the first fsck was designed for the Fast File System. (Original fsck paper in text gzipped format) As is well known, FFS had no formal method of maintaining file system consistency if the file system was not cleanly unmounted. (In fact, in the earliest days, the operator had to sync the file system by hand before shutting the system down.) Many write operations require writing more than one block on disk. If a system crash occurred, some random subset of the outstanding writes would be on disk, and the rest would not. When the system booted again, the file system would be in an inconsistent state and not usable - perhaps an inode had zero links to it, but was still marked as allocated, and therefore could never be freed. As well, corruption might occur for other reasons - a bad disk, or a file system bug - and not be found until the whole file system was checked.

fsck in this earliest incarnation therefore did the following things: It checked the whole file system for inconsistencies, both from an unclean mount and other source of corruption, and in the process attempted to repair any inconsistencies it found. (Repair here means, as it does in the rest of the article, returning the file system to a usable consistent state, rather than to some platonic ideal of what the file system would have been without the corruption.) The majority of the inconsistencies were the result of an unclean unmount, and the steps to fixing them were fairly well known. The first use of fsck meant "check the file system and fix any in-progress writes that didn't complete so that the file system can be mounted." This is the use that carried over to the ext2 file system in Linux.

fsck and journaling file systems

Running fsck after every unclean unmount was an unpleasant, time-consuming, and dangerous experience. Many a sysadmin has distinct memories of lines of unintelligible gobbledygook scrolling off the screen, each ending with "Fix? <y>", and a sore finger from holding down the enter key (this was before the -y switch). The new journaling file systems, like XFS, VxFS, Reiserfs, and ext3, made running fsck after an unclean unmount unnecessary.

Journaling file systems keep an on-disk log of write operations to the file system. When the entirety of a write operation is in the log, then the file system begins rewriting the changes to their final location on disk. If the system crashes or something else goes wrong, then the journal entry is still on-disk on the next mount, and the file system will finish replaying the entry, so that the entire self-consistent set of changes to the metadata will go to disk. fsck no longer had to clean up after half-finished writes, and the file system only had to replay the journal after an unclean unmount.

Some file system developers initially took this to mean that no fsck was needed at all. In part, this was true - the system no longer needed to repair half-finished writes by scanning the entire file system, it only had to replay the log. But fixing half-finished writes was only one part of what fsck did. It also checked for and repaired corruption caused by disk errors, file system bugs, administrator error, and any other source. These sources of errors are less common and can be ignored in development, but become a major problem in production use. Nobody wanted to repair a journaling file system by hand any more than any other file system. fsck in the sense of "repair half-completed writes" is unnecessary for journaling file systems (or copy-on-write file systems) but it is still necessary in the sense of "check for and repair file system corruption when something unexpected goes wrong."

The XFS developers decided to head off the fsck naming confusion at the pass and created two commands, xfs_check, which checks the file system for corruption, and xfs_repair, which repairs corruption. The xfs_check man page immediately clears up any confusion about when to run it:

xfs_check checks whether an XFS filesystem is consistent. It is normally run only when there is reason to believe that the filesystem has a consistency problem.

The Reiser version 3 file system, reiserfs, tried something radical and new with its file system check and repair program. It had three major modes: "check," "fix fixable," and "rebuild tree." It divided file system corruption into two kinds: that which is easily fixable, and that which was handled by throwing away most of the metadata and rebuilding the entire file system tree using only the leaves as a starting point (reiserfs puts all of the file system metadata and data into one "balanced tree" structure). The file system repair program only had to deal with a limited set of "easy" corruption repairs. Anything harder just threw away all the "secondary" metadata that could be conflicting and then did a brute force search for the "primary" metadata - the leaves of the tree - and rebuilt a tree out of them. The downside of this approach is that there is no out-of-band signal to say what blocks are metadata and which are not, so it used a magic number present in reiserfs metadata to decide what should be part of the tree. Unfortunately, regular file data can have this magic number, and one common use case was to keep a reiserfs file system image in a file (to mount using the loop device) on a reiserfs file system. The result was that file systems became trivially corrupted during a tree rebuild, since the metadata leaves in the loopback became incorporated into the parent file system.

fsck and soft updates

Soft updates, implemented on FFS for BSD, introduced another meaning of fsck. Soft updates is a method of recording and ordering metadata writes to the disk so that if a system crash occurs, the file system is consistent, with the exception of possible leaked inodes and blocks. When the system boots after an unclean unmount, fsck takes a snapshot of the file system (using an interesting file-based copy-on-write mechanism) and checks it, looking for leaked inodes and blocks. As soon as the snapshot is taken, the system goes forward with the normal boot process, mounting the file system read-write. When fsck finishes, it releases the leaked inodes and blocks it found and lets go of its snapshot. Soft updates gave immediate access to the file system after unclean unmount, without changing the on-disk format of the original FFS file system. fsck in this case meant two things: search for and free leaked inodes and blocks, and repair unexpected corruption.

fsck and copy-on-write file systems

Copy-on-write file systems use an atomic rewrite of the top block in the file system hierarchy to switch between one consistent file system state and another. Copy-on-write file systems may have some form of logging, but this is for the purpose of swiftly recording recent changes to the file system rather than being necessary for the consistency of the file system as in journaling. For example, Write Anywhere File Layout (WAFL) keeps a log of recent writes in an NVRAM device, and ZFS keeps an intent log of recent operations. fsck for copy-on-write file systems is then restricted to the role of checking for and repairing unexpected, unlooked-for file system corruption. fsck is only run as a paranoia check or in response to some sign of corruption.

Not much information is available on the file system check and repair tools for WAFL, other than that they exist. Searching for the file system check and repair tool for WAFL, wafl_check, only gives about 100 results from Google. The online consistency check tool is named wafliron (ha!) and had about 100 results as well.

ZFS's file system check and repair facilities don't follow the usual interface boundaries. The zdb command, used for debugging ZFS, has an undocumented option which will cause it to traverse the entire file system tree, checking checksums as it goes, for a basic consistency check. (Undocumented, because, as the man page says, "The zdb command is used by support engineers to diagnose failures and gather statistics. Since the ZFS file system is always consistent on disk and is self-repairing, zdb should only be run under the direction [of] a support engineer.") Checks and fixes for some problems the developers have observed in the wild are implemented in-kernel. The best known of these in-kernel repair facilities is the automatic repair of a damaged block with two copies, replacing the copy which does not match the block's checksum with the good copy if available. Since all metadata has at least two copies, this fixes most data corruption (the exceptions include things like in-memory block corruption). This collection of features definitely qualifies as file system check and repair, but people will argue whether they should be called fsck or not.

Which fsck do you mean?

We've seen fsck in all its infinite glory, everything from a simple traversal of the file system metadata to groveling through the entire file system cleaning up after a simple-minded file system. Sometimes the names of the programs implementing file system check and repair have improved on unpronounceable fsck (xfs_repair), and sometimes they are just funny (wafliron). One thing is for sure: fsck is an overloaded word, with as many interpretations as there are listeners. Until the file systems community comes up with new terminology, you'll be best served by defining exactly what you mean by "fsck" - "file system consistency check," "file system inconsistency repair," or other unwieldy descriptions.

(Note to readers: Lots more kinds of fsck exist - for example, I didn't cover any flash file systems, which tend to be different in very interesting ways. Please add comments about other kinds of fsck, or details on the ones described here. And of course, your fsck war stories. - V.H.)

Comments (16 posted)

Video4Linux2 part 7: Controls

By Jonathan Corbet
August 31, 2007

The LWN.net Video4Linux2 API series.
With the completion of part 6 of this series, we now know how to set up a video device and transfer frames back and forth. It is a well known fact, however, that users can be hard to please; not content with being able to see video from their camera device, they immediately start asking if they can play with parameters like brightness, contrast, and more. These adjustments could be done in the video application, and sometimes they are, but there are advantages to doing them in the hardware itself when the hardware has that capability. A brightness adjustment, for example, might lose dynamic range if done after the fact, but a hardware-based adjustment may retain the full range that the sensor is capable of delivering. Hardware-based adjustments, obviously, will also be easier on the host processor.

Current hardware typically has a wide range of parameters which can be adjusted on the fly. Just how those parameters work varies widely from one device to the next, though. An adjustment as simple as "brightness" could involve a straightforward register setting, or it could require a rather more complex change to an obscure transformation matrix. It would be nice to hide as much of this detail from the application as possible, but there are limits to how much hiding can be done. An overly abstract interface might make it impossible to use the hardware's controls to their fullest potential.

The V4L2 control interface tries to simplify things as much as possible while allowing full use of the hardware. It starts by defining a set of standard control names; these include V4L2_CID_BRIGHTNESS, V4L2_CID_CONTRAST, V4L2_CID_SATURATION, and many more. There are boolean controls for features like white balance, horizontal and vertical mirroring, etc. See the V4L2 API spec for a full list of predefined control ID values. There is also a provision for driver-specific controls, but those, clearly, will generally only be usable by special-purpose applications. Private controls start at V4L2_CID_PRIVATE_BASE and go up from there.

In typical fashion, the V4L2 API provides a mechanism by which an application can enumerate the available controls. To that end, they will make ioctl() calls which end up in a V4L2 driver via the vidioc_queryctrl() callback:

    int (*vidioc_queryctrl)(struct file *file, void *private_data,
			    struct v4l2_queryctrl *qc);

The driver will normally fill in the structure qc with information about the control of interest, or return EINVAL if that control is not supported. This structure has a number of fields:

    struct v4l2_queryctrl
    {
	__u32		     id;
	enum v4l2_ctrl_type  type;
	__u8		     name[32];
	__s32		     minimum;
	__s32		     maximum;
	__s32		     step;
	__s32		     default_value;
	__u32                flags;
	__u32		     reserved[2];
    };

The control being queried will be passed in via id. As a special case, the application can supply a control ID with the V4L2_CTRL_FLAG_NEXT_CTRL bit set; when this happens, the driver should return information about the next supported control ID higher than the one given by the application. In any case, id should be set to the ID of the control actually being described.

All of the other fields are set by the driver to describe the selected control. The data type of the control is given in type; it can be V4L2_CTRL_TYPE_INTEGER, V4L2_CTRL_TYPE_BOOLEAN, V4L2_CTRL_TYPE_MENU (for a set of fixed choices), or V4L2_CTRL_TYPE_BUTTON (for a control which performs some action when set and which ignores any given value). name describes the control; it could be used in the interface presented to the user by the application. For integer controls (only), minimum and maximum describe the range of values implemented by the control, and step gives the granularity of that range. default_value is exactly what it sounds like - though it is only applicable to integer, boolean, and menu controls. Drivers should set control values to their default at initialization time only; like other device parameters, they should persist across open() and close() calls. As a result, default_value may well not be the current value of the control.

Inevitably, there is a set of flags which further describe a control. V4L2_CTRL_FLAG_DISABLED means that the control is disabled; the application should ignore it. V4L2_CTRL_FLAG_GRABBED means that the control, temporarily, cannot be changed, perhaps because another application has taken it over. V4L2_CTRL_FLAG_READ_ONLY marks controls which can be queried, but which cannot be changed. V4L2_CTRL_FLAG_UPDATE means that adjusting this control may affect the values of other controls. V4L2_CTRL_FLAG_INACTIVE marks a control which is not relevant to the current device configuration. And V4L2_CTRL_FLAG_SLIDER is a hint that applications should represent the control with a slider-like interface.

Applications might just query a few controls which have been specifically programmed in, or they may want to enumerate the entire set. In the latter case, they will start at V4L2_CID_BASE and step through V4L2_CID_LASTP1, perhaps using the V4L2_CTRL_FLAG_NEXT_CTRL flag in the process. For controls of the menu variety (type V4L2_CTRL_TYPE_MENU), applications will probably want to enumerate the possible values as well. The relevant callback is:

    int (*vidioc_querymenu)(struct file *file, void *private_data,
			    struct v4l2_querymenu *qm);

The v4l2_querymenu structure looks like:

    struct v4l2_querymenu
    {
	__u32		id;
	__u32		index;
	__u8		name[32];
	__u32		reserved;
    };

On input, id is the ID value for the menu control of interest, and index is the index value for a specific menu value. Index values start at zero and go up to the maximum value returned from vidioc_queryctrl(). The driver will fill in the name of the menu item; the reserved field should be set to zero.

Once the application knows about the available controls, it will likely set about querying and changing their values. The structure used in this case is relatively simple:

    struct v4l2_control
    {
	__u32 id;
	__s32 value;
    };

To query a specific control, an application will set id to the ID of the control and make a call which ends up in the driver as:

    int (*vidioc_g_ctrl)(struct file *file, void *private_data,
    			 struct v4l2_control *ctrl);

The driver should set value to the current setting of the control. Of course, it should also be sure that it knows about this specific control and return EINVAL if the application attempts to query a nonexistent control. Attempts to query button controls should also return EINVAL.

A request to change a control ends up in:

    int (*vidioc_s_ctrl)(struct file *file, void *private_data,
			 struct v4l2_control *ctrl);

The driver should verify the id and make sure that value falls within the allowed range. If all is well, the new value should be set in the hardware.

Finally, it is worth noting that there is a separate extended controls interface supported with V4L2. This API is meant for relatively complex controls; in practice, its main use is for MPEG encoding and decoding parameters. Extended controls can be grouped into classes, and 64-bit integer values are supported. The interface is similar to the regular control interface; see the API specification for details.

Comments (2 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 2.6.23-rc5 ?
Andrew Morton 2.6.23-rc4-mm1 ?
Daniel Walker 2.6.23-rc4-dw1 ?
Greg Kroah-Hartman Linux 2.6.22.6 ?

Architecture-specific

Core kernel code

Matthew Wilcox TASK_KILLED ?
Matthew Wilcox TASK_KILLABLE version 2 ?
Roman Zippel Really Fair Scheduler ?

Device drivers

Filesystems and block I/O

Memory management

Security-related

Virtualization and containers

Benchmarks and bugs

Jeffrey W. Baker ZFS, XFS, and EXT4 compared ?

Miscellaneous

Page editor: Jake Edge
Next page: Distributions>>


Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds