Kernel development [LWN.net]

Kernel release status

The current stable 2.6 kernel is 2.6.19.1, released on December 11. It contains quite a few fixes, including two for security-related problems.

There have been no 2.6 prepatches over the last week as the 2.6.20 merge window is still open. Quite a few patches have found their way into the mainline git repository; see below for a summary.

The current -mm tree is 2.6.19-mm1. Recent changes to -mm include new debugging features for kmap_atomic(), the user-space driver framework, and a public-key transport mechanism for eCryptfs. Mostly, however, -mm has shrunk considerably as patches have moved into the mainline.

For older 2.6 kernels: Adrian Bunk has released 2.6.16.35 with a few dozen fixes (one security-related). He has also released 2.6.16.36-rc1 with a handful of patches.

Comments (2 posted)

Quotes of the week

So let's come out and ban binary modules, rather than pussyfooting around, if that's what we actually want to do.

It comes down to a question of whether we have enough leverage to push them into doing what we want, or not - are we prepared to call their bluff?

The current half-assed solution of chipping slowly away at things by making them EXPORT_SYMBOL_GPL one by one makes little sense - would be better if we actually made an affirmative decision one way or the other.

-- Martin Bligh

Give people 12 months warning (time to work out what they're going to do, talk with the legal dept, etc) then make the kernel load only GPL-tagged modules.

I think I'd favour that. It would aid those people who are trying to obtain device specs, and who are persuading organisations to GPL their drivers.

-- Andrew Morton

I'll whip up such a patch in a bit to spit out kernel log messages whenever such a module is loaded so that people have some warning.

-- Greg Kroah-Hartman

Comments (7 posted)

Coming soon to a kernel near you

When last week's summary was written, the process of merging patches for 2.6.20 had just begun. Linus has been busy since then; some of the highlights of what has gone in appear below.

User-visible changes include:

The kernel can now operate with a 300Hz clock rate, which happens to work well with both 25 frame-per-second and 30 FPS video.
New drivers for the real-time clock on OMAP1 chips, the AES engine on Geode LX processors, IBM GXT4500P display cards, DiBcom DiB7000M and DiB7000P demodulators, Pinnacle 400e DVB-S USB receivers, Phillips IP3204 I2C controllers, Atmel AT91 I2C controllers, Winbond W83793 hardware monitoring chips, National Semiconductor PC87427 hardware monitoring chips, and Apple Motion Sensors. The "usbvision" driver has been merged, adding support for "more than 50" USB video camera devices. Finally, your editor's drivers for the "Cafe" camera controller and OmniVision OV7670 sensor (both used in the OLPC system) have been merged.
The kernel can now (on i386 systems) be built in an entirely relocatable manner. This feature is most useful for people who install a second kernel in memory to generate crash dumps.
Support for the Liskov-Rivest-Wagner block cypher has been added.
A large set of fixes and enhancements for the GFS2 filesystem have been merged; these include support for TCP connections in the lock manager.
Support for I/O accounting has been improved. There is a new file (/proc/pid/io) where a process's statistics may be read (though the netlink-based taskstats interface remains the preferred way to get this data).
Support for Intel's hardware virtualization features (via /dev/kvm) has been merged.

Changes of note for kernel developers include:

Attempts to build the kernel with gcc 4.1.0 will generate warnings, since this compiler is known to make mistakes.
Fixes for code broken by the workqueue changes continue to find their way into the tree. If you have to deal with some of this code, these instructions may prove helpful.
As if the workqueue changes were not enough, there is also now a "freezable" workqueue type, being a workqueue which can be frozen early in the suspend-to-disk process. These queues are created with create_freezeable_workqueue(); there is no single-threaded version available.
There is also a new run_scheduled_work() function which will cause a previously-scheduled work_struct to run synchronously, assuming it has not already run elsewhere.
The internal __alloc_skb() function has a new parameter, being the number of the NUMA node on which the structure should be allocated.
The slab allocator API has been cleaned up somewhat. The old kmem_cache_t typedef is gone; struct kmem_cache should be used instead. The various slab flags (SLAB_ATOMIC, SLAB_KERNEL, ...) were all just aliases for the equivalent GFP_ flags, so they have been removed.
A new boot-time parameter (prof=sleep) causes the kernel to profile the amount of time spent in uninterruptible sleeps.
dma_cache_sync() has a new argument: the device structure for the device doing DMA.
The paravirt_ops code has gone in, making it easier for the kernel to support multiple hypervisors.
The struct path changes have been merged, with changes rippling through the filesystem and device driver subsystems.
The fault injection framework has been merged.
There is now a generic layer for human input devices; the USB HID code has been switched over to this new layer.
A new function, round_jiffies(), rounds a jiffies value up to the next full second (plus a per-CPU offset). Its purpose is to encourage timeouts to occur together, with the result that the CPU wakes up less frequently.
The block "activity function," a callback intended for the implementation of disk activity lights in software, has been removed; nobody was actually using it.

The merge window remains open, as of this writing, so expect a few more things to go in before 2.6.20 takes its final shape.

Comments (8 posted)

Kevent take 26

Some patches make it into the kernel in something very close to their original form. Others have to go through a few changes first. The all-time record for development iterations may be held by devfs; Richard Gooch had just released the 157th revision when this ill-fated subsystem was merged for 2.3.46. On that scale, Evgeniy Polyakov is just getting started with kevent take 26; even so, the process must be starting to seem like a long one.

In this case, however, the long process can be seen as evidence that the system is working as it should. The kevent subsystem is a major addition to the Linux system call API. Once it goes in, it will have to be supported forever (to a finite-precision arithmetic approximation, at least). Adding a kevent interface with warts, or which does not provide the best performance possible, would be a serious mistake. Nobody wants to be faced with designing and implementing a new event interface in a few years while supporting the old one indefinitely. So it makes sense to go slowly and make sure that things have been thought out well.

The number of people posting comments on the kevent patches has been relatively small; for whatever reason, many normally vocal developers do not seem to have much to say on this new API. Fortunately, Ulrich Drepper (the glibc maintainer) has taken a strong interest in this interface and has pushed hard for the changes he thought were necessary. One gets the sense the Ulrich and Evgeniy have gotten a little tired of each other over the last month or so. But, to their credit, they have stuck to the task. As of this writing, Ulrich has not commented on the version of the API implemented in the "take 26" patch set. It does, however, clearly reflect some of the things he has been asking for.

While Evgeniy has been concerned with getting events out of the kernel, Ulrich has been worried about performance and robustness. So he wanted ways for multi-threaded programs to cancel threads at any time without losing track of which events have been processed. Whenever possible, he would like to be able to process events without involving the kernel at all. And he has pushed strongly for timeout values to be represented in an absolute format. Evgeniy has (a bit grudgingly, at times) addressed most of these wishes.

It is still possible to get a kevent file descriptor by opening /dev/kevent, though that is no longer the only way. The kevent_ctl() system call is still used for the management of events:

    int kevent_ctl(int fd, unsigned int cmd, unsigned int num, 
                   struct ukevent *arg);

With kevent_ctl(), an application can add requests for events, remove them, or modify them in place. There is a new KEVENT_CTL_READY operation which can be used to mark specific events as being "ready" and cause the kernel to wake up one or more processes waiting for events.

The synchronous interface has been changed slightly:

    int kevent_get_events(int ctl_fd, unsigned int min_nr, 
                          unsigned int max_nr, struct timespec timeout, 
			  struct ukevent *buf, unsigned flags);

The difference is that the timeout value now is a struct timespec. That value is still interpreted as a relative timeout, however, unless flags contains KEVENT_FLAGS_ABSTIME. In the latter case, timeout is an absolute time, and the code will print a warning to the effect that Evgeniy was wrong in believing that nobody would ever want to use absolute times.

It is expected, however, that performance-aware applications will use the user-space ring buffer rather than the synchronous interface. That ring buffer is still set up with kevent_init():

    int kevent_init(struct kevent_ring *ring, unsigned int ring_size,
                    unsigned int flags);

The file descriptor argument has been removed from this system call; instead, kevent_init() opens a new file descriptor and passes it back as its return value. Thus, there is no separate need to open /dev/kevent.

The kevent_ring structure has changed a bit since it was last discussed on this page:

    struct kevent_ring
    {
        unsigned int ring_kidx, ring_over;
   	struct ukevent event[0];
    };

The new ring_over value counts the number of times that the index into the ring has wrapped around. This parameter is used to ensure that the kernel and the application have the same understanding of the state of the ring buffer before allowing the application to mark events as being consumed.

Waiting for events to arrive in the ring is done with kevent_wait(), which now looks like this:

    int kevent_wait(int ctl_fd, unsigned int num, unsigned int old_uidx, 
 	            struct timespec timeout, unsigned int flags);

Here, too, the timeout value is a struct timespec, and, once again, absolute timeouts must be marked with the KEVENT_FLAGS_ABSTIME flag. This call will wait until at least one event is ready, then copy up to num events into the ring buffer. The old_uidx is the index of the last event that the calling application knows about; if more events are added between when the application checks and when it calls kevent_wait(), that call will return immediately.

In older versions of the patch, there was no way to tell the kernel when events had been consumed out of the ring; one simply had to hope this had happened by the time the index wrapped around and events were overwritten. In the new version, instead, the application's current position is tracked, and the kernel should be occasionally informed when entries in the ring buffer are freed. That job is done with kevent_commit():

    int kevent_commit(int ctl_fd, unsigned int new_idx, unsigned int over);

Here, new_idx is the index of the last event which has been consumed by the application. The value for over should be the ring_over field from the kevent_ring structure. If that value does not match what the kernel thinks it should be, the attempt to update the index will fail on the assumption that the calling process got scheduled out for a while and things happened while it was not looking. If this check were not made, confusion over index wraparound could cause events to be lost.

As of this writing, the most significant comment is that the name "kevent" suggests an in-kernel API. The commenter (Jeff Garzik) prefers a name like "uevent" (even though there is already a subsystem which returns "uevents" in the kernel). If that remains the most substantial criticism, the kevent code might find its way into the mainline long before Evgeniy breaks the devfs record.

Comments (8 posted)

Video4Linux2 part 4: inputs and outputs

The LWN.net Video4Linux2 API series.

This is the fourth article in the irregular LWN series on writing video drivers for Linux. Those who have not yet read the introductory article may want to start there. This week's episode describes how an application can determine which inputs and outputs are available on a given adapter and select between them.

In many cases, a video adapter does not provide a lot of input and output options. A camera controller, for example, may provide the camera and little else. In other cases, however, the situation is more complicated. A TV card might have multiple inputs corresponding to different connectors on the board; it could even have multiple tuners capable of functioning independently. Sometimes those inputs have different characteristics; some might be able to tune to a wider range of video standards than others. The same holds for outputs.

Clearly, for an application to be able to make full use of a video adapter, it must be able to find out about the available inputs and outputs, and it must be able to select the one it wishes to operate with. To that end, the Video4Linux2 API offers three different ioctl() calls for dealing with inputs, and an equivalent three for outputs. Drivers should implement all three (for each functionality supported by the hardware), even though, for simple hardware, the corresponding code can be quite simple. Drivers should also provide reasonable defaults on startup. What a driver should not do, however, is reset input and output information when an application exits; as with other video parameters, these settings should be left unchanged between opens.

Video standards

Before we can get into the details of inputs and outputs, however, we must have a look at video standards. These standards describe how a video signal is formatted for transmission - resolution, frame rates, etc. These standards are usually set by regulatory authorities in each country. There are three major types of video standard used in the world: NTSC (used in North America, primarily), PAL (much of Europe, Africa, and Asia), and SECAM (France, Russia, parts of Africa). There are, however, variations in the standards from one country to the next, and some devices are more flexible than others in the variants they can work with.

The V4L2 layer represents video standards with the type v4l2_std_id, which is a 64-bit mask. Each standard variant is then one bit in the mask. So "standard" NTSC is V4L2_STD_NTSC_M, value 0x1000, but the Japanese variant is V4L2_STD_NTSC_M_JP (0x2000). If a device can handle all variants of NTSC, it can set a standard type of V4L2_STD_NTSC, which has all of the relevant bits set. Similar sets of bits exist for the variants of PAL and SECAM. See this page for a complete list.

For user space, V4L2 provides an ioctl() command (VIDIOC_ENUMSTD) which allows an application to query which standards are implemented by a device. The driver does not need to answer those queries directly, however; instead, it simply sets the tvnorm field of the video_device structure with all of the standards that it supports. The V4L2 layer will then split out the supported standards for the application. The VIDIOC_G_STD command, used to query which standard is active at the moment, is also handled in the V4L2 layer by returning the value in the current_norm field of the video_device structure. The driver should, at startup, initialize current_norm to reflect reality; some applications will get confused if no standard is set, even though they have not set one.

When an application wishes to request a specific standard, it will issue a VIDIOC_S_STD call, which is passed through to the driver via:

    int (*vidioc_s_std) (struct file *file, void *private_data,
                         v4l2_std_id std);

The driver should program the hardware to use the given standard and return zero (or a negative error code). The V4L2 layer will handle setting current_norm to the new value.

The application may want to know what kind of signal the hardware actually sees on its input. The answer can be found with VIDIOC_QUERYSTD, which reaches the driver as:

    int (*vidioc_querystd) (struct file *file, void *private_data,
                            v4l2_std_id *std);

The driver should fill in this field in the greatest detail possible. If the hardware does not provide much information, the std field should indicate any of the standards which might be present.

There is one more point worth noting here: all video devices must support (or at least claim to support) at least one standard. Video standards make little sense for camera devices, which are not tied to any specific regulatory regime. But there is no standard for "I'm a camera and can do almost anything you want." So the V4L2 layer has a number of camera drivers which claim to return PAL or NTSC data.

Inputs

A video acquisition application will start by enumerating the available inputs with the VIDIOC_ENUMINPUT command. Within the V4L2 layer, that command will be turned into a call to the driver's corresponding callback:

    int (*vidioc_enum_input)(struct file *file, void *private_data,
			     struct v4l2_input *input);

In this call, file corresponds to the open video device, and private_data is the private field set by the driver. The input structure is where the real information is passed; it has several fields of interest:

__u32 index: the index number of the input the application is interested in; this is the only field which will be set by user space. Drivers should assign index numbers to inputs, starting at zero and going up from there. An application wanting to know about all available inputs will call VIDIOC_ENUMINPUT with index numbers starting at zero and incrementing from there; once the driver returns EINVAL the application knows that it has exhausted the list. Input number zero should exist for all input-capable devices.
__u8 name[32]: the name of the input, as set by the driver. In simple cases, it can simply be "Camera" or some such; if the card has multiple inputs, the name used here should correspond to what is printed by the connector.
__u32 type: the type of input. There are currently only two: V4L2_INPUT_TYPE_TUNER and V4L2_INPUT_TYPE_CAMERA.
__u32 audioset: describes which audio inputs can be associated with this video input. Audio inputs are enumerated by index number just like video inputs (we'll get to audio in another installment), but not all combinations of audio and video can be selected. This field is a bitmask with a bit set for each audio input which works with the video input being enumerated. If no audio inputs are supported, or if only a single input can be selected, the driver can simply leave this field as zero.
__u32 tuner: if this input is a tuner (type is set to V4L2_INPUT_TYPE_TUNER), this field will contain an index number corresponding to the tuner device. Enumeration and control of tuners will be covered in a future installment too.
v4l2_std_id std: describes which video standard(s) are supported by the device.
__u32 status: gives the status of the input. The full set of flags can be found in the V4L2 documentation; in short, each bit set in status describes a problem. These can include no power, no signal, no synchronization lock, or the presence of Macrovision, among other unfortunate events.
__u32 reserved[4]: reserved fields. Drivers should set them to zero.

Normally, the driver will set all of the fields above and return zero. If index is outside the range of supported inputs, -EINVAL should be returned instead; there is not much else that can go wrong in this call.

When the application wants to change the current input, the driver will receive a call to its vidioc_s_input() callback:

    int (*vidioc_s_input) (struct file *file, void *private_data, 
                           unsigned int index);

The index value has the same meaning as before - it identifies which input is of interest. The driver should program the hardware to use that input and return zero. Other possible return values are -EINVAL (for a bogus index number) or -EIO (for hardware trouble). Drivers should implement this callback even if they only support a single input.

There is also a callback to query which input is currently active:

    int (*vidioc_g_input) (struct file *file, void *private_data, 
                           unsigned int *index);

Here, the driver sets *index to the index number of the currently active input.

Outputs

The process for enumerating and selecting outputs is very similar to that for inputs, so the description here will be a little more brief. The callback for output enumeration looks like this:

    int (*vidioc_enumoutput) (struct file *file, void *private_data
    			      struct v4l2_output *output);

The fields of the v4l2_output structure are:

__u32 index: the index value corresponding to the output. This index works the same way as the input index: it starts at zero and goes up from there.
__u8 name[32]: the name of the output.
__u32 type: the type of the output. The supported output types are V4L2_OUTPUT_TYPE_MODULATOR for an analog TV modulator, V4L2_OUTPUT_TYPE_ANALOG for basic analog video output, and V4L2_OUTPUT_TYPE_ANALOGVGAOVERLAY for analog VGA overlay devices.
__u32 audioset: the set of audio outputs which can operate with this video output.
__u32 modulator: the index of the modulator associated with this device (for those of type V4L2_OUTPUT_TYPE_MODULATOR).
v4l2_std_id std: the video standards supported by this output.
__u32 reserved[4]: reserved fields, should be set to zero.

There are callbacks for getting and setting the current output setting; they mirror the input callbacks:

    int (*vidioc_g_output) (struct file *file, void *private_data, 
                            unsigned int *index);
    int (*vidioc_s_output) (struct file *file, void *private_data, 
                            unsigned int index);

Any device which supports video output should have all three output callbacks defined, even if there is only one possible output.

With these methods in place, a V4L2 application can determine which inputs and outputs are available on a given device and choose between them. The task of determining just what kind of video data flows through those inputs and outputs is rather more complicated, however. The next installment in this series will begin to look at video data formats and how to negotiate a format with user space.

Comments (none posted)

Chris Wright Linux 2.6.19.1 ?

Andrew Morton 2.6.19-mm1 ?

Con Kolivas 2.6.19-ck2 ?

Adrian Bunk Linux 2.6.16.36-rc1 ?

Adrian Bunk Linux 2.6.16.35 ?

Gerald Schaefer s390: noexec protection ?

Li Yu sched: A simple priority ceiling framework and an implementation for mutex. ?

Venkatesh Pallipadi Add not_critical_when_idle mode for generic timers ?

Evgeniy Polyakov kevent: Generic event handling mechanism. ?

Siddha, Suresh B Patch: dynticks: idle load balancing ?

john stultz HZ free ntp ?

Bryan O'Sullivan [PATCH 1 of 2] Add memcpy_uncached_read, a memcpy that tries to reduce cache pressure ?

Junio C Hamano GIT 1.4.4.2 ?

Ingo Molnar debug: add sysrq_always_enabled boot option ?

David Singleton new procfs memory analysis feature ?

Keiichi KII proposal for dynamic configurable netconsole ?

Dmitry Torokhov Input patches for 2.6.19 ?

Greg KH HID patches for 2.6.19 ?

Greg KH more Driver core patches for 2.6.19 ?

Mauro Carvalho Chehab V4L/DVB updates ?

jayakumar.lkml@gmail.com fbdev,mm: hecuba/E-Ink fbdev driver v2 ?

Steve Wise 2.6.20 Chelsio T3 RDMA Driver ?

Matthew Wilcox Add support for asynchronous scans to libata ?

Jean Delvare i2c updates for 2.6.20 ?

Jean Delvare hwmon updates for 2.6.20 ?

Andres Salomon psmouse split ?

Ben Dooks SM501: core (mfd) driver ?

Chen, Kenneth W optimize o_direct on block device - v3 ?

Michael Halcrow eCryptfs: Public key; transport mechanism ?

NeilBrown knfsd: Preparation for IPv6 support ?

Nikolai Joukov RAIF: Redundant Array of Independent Filesystems ?

Jeff Garzik Delete JFFS (version 1) ?

KAMEZAWA Hiroyuki [PATCH] virtual memmap on sparsemem v3 [0/4] introduction ?

Avi Kivity kvm userspace release 6 ?

Zachary Amsden VMI backend for paravirt-ops ?

Eric W. Biederman tty layer and misc struct pid conversions ?

Arjan van de Ven Announce: New release of the Linux-ready Firmware Developer Kit ?

Arjan van de Ven announce: irqbalance 0.55 released ?

Douglas Gilbert lsscsi version 0.19 beta ?

Kernel development

Brief items

Kernel release status

Kernel development news

Quotes of the week

Coming soon to a kernel near you

Kevent take 26

Video4Linux2 part 4: inputs and outputs

Video standards

Inputs

Outputs

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Janitorial

Memory management

Virtualization and containers

Miscellaneous