|
|
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch is 2.6.21-rc1, released on February 20. "There's a lot of changes, as is usual for an -rc1 thing, but at least so far it would seem that 2.6.20 has been a good base, and I don't think we have anything *really* scary here." Significant changes include the long-awaited dynamic tick patch, better high-resolution timer support, the VMI virtualization interface (now built on top of paravirt_ops), the ALSA "system on chip" layer, lots of new drivers, and more. See the short-form changelog for details, or the full changelog for lots of details.

As of this writing, a few hundred patches have found their way into the mainline git repository since -rc1 was released. Most of them are in the Video4Linux subsystem, adding ASUS P7131 remote control support, BTTV cropping support, a big update to the pvrusb2 WinTV driver, a new MSI Mega Sky 580 driver, and quite a bit more.

The current -mm tree is 2.6.20-mm2. Recent changes to -mm include Xen DomU support, lguest, Blackfin architecture support, more workqueue changes, POSIX listio completion support for asynchronous I/O, utrace (a new tracing mechanism meant to replace ptrace()), and the kernel markers patch.

Stable kernel updates: 2.6.20.1, 2.6.19.4, and 2.6.18.7 were all released on February 20 with a single patch: a fix for the NFS ACL denial of service vulnerability. Larger updates for 2.6.18 and 2.6.19 (probably the last stable updates for both of those kernels) are currently in the works, with a likely release around the 23rd or 24th.

2.6.16.41 was released on February 18 with about a dozen fixes.

Comments (2 posted)

Kernel development news

More changes for 2.6.21

With the release of 2.6.21-rc1, the merge window for this kernel development cycle is now closed. Most of the major 2.6.21 changes were covered here last week, but a number of significant changes did get into the mainline between then and the closing of the window. They are:

  • The VMI virtualization interface has been merged. VMI is a generic hypervisor interface; it is (now) built on top of paravirt_ops and provides a higher level of functionality.

  • The clocksource and dynamic tick patches have been merged.

  • Various improvements to the kernel's support for Sony laptops.

  • The deprecated ACPI "hotkey" driver has been removed.

  • Version 1 of the JFFS filesystem has been removed.

  • The audit subsystem has a "lockdown" mode where further configuration changes cannot be made.

  • A simple driver allowing Blackberry devices to be charged from a Linux system's USB port has been merged.

  • A big ARM update has been merged with oprofile support for ARMv6 processors, kexec() support, support for a number of new board and processor variants, and more.

  • The v9fs (Plan 9) filesystem has seen a number of improvements, mostly in the form of better caching.

  • The SYSV shared memory code has been reworked for more sane internal file usage and easier integration into the ongoing containers / namespaces work.

  • A driver for the Silicon Motion SM501 "multimedia companion" chip has been added.

Now the stabilization period begins, with the final 2.6.21 due somewhere approximately around the beginning of May.

Comments (9 posted)

The managed resource API

The device resource management patch was discussed here in January. That patch has now been merged for the 2.6.21 kernel. Since the API is now set - at least, as firmly as any in-kernel API is - it seems like a good time for a closer look at this new interface.

The core idea behind the resource management interface is that remembering to free allocated resources is hard. It appears to be especially hard for driver writers who, justly or not, have a reputation for adding more than their fair share of bugs to the kernel. And even the best driver writers can run into trouble in situations where device probing fails halfway through; the recovery paths may be there in the code, but they tend not to be well tested. The result of all this is a fair number of resource leaks in driver code.

To address this problem, Tejun Heo created a new set of resource allocation functions which track allocations made by the driver. These allocations are associated with the device structure; when the driver detaches from the device, any left-over allocations are cleaned up. The resource management interface is thus similar to the talloc() API used by the Samba hackers, but it is adapted to the kernel environment and covers more than just memory allocations.

Starting with memory allocations, though, the new API is:

    void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp);
    void devm_kfree(struct device *dev, void *p);

In a pattern we'll see repeated below, the new functions are similar to kzalloc() and kfree() except for the new names and the addition of the dev argument. That argument is necessary for the resource management code to know when the memory can be freed. If any memory allocations are still outstanding when the associated device is removed, they will all be freed at that time.

Note that there is no managed equivalent to kalloc(); if driver writers cannot be trusted to free memory, it seems, they cannot be trusted to initialize it either. There are also no managed versions of the page-level or slab allocation functions.

Managed versions of a subset of the DMA allocation functions have been provided:

    void *dmam_alloc_coherent(struct device *dev, size_t size,
			      dma_addr_t *dma_handle, gfp_t gfp);
    void dmam_free_coherent(struct device *dev, size_t size, void *vaddr,
			    dma_addr_t dma_handle);
    void *dmam_alloc_noncoherent(struct device *dev, size_t size,
			         dma_addr_t *dma_handle, gfp_t gfp);
    void dmam_free_noncoherent(struct device *dev, size_t size, void *vaddr,
			       dma_addr_t dma_handle);
    int dmam_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
				     dma_addr_t device_addr, size_t size, 
				     int flags);
    void dmam_release_declared_memory(struct device *dev);
    struct dma_pool *dmam_pool_create(const char *name, struct device *dev,
				      size_t size, size_t align,
				      size_t allocation);
    void dmam_pool_destroy(struct dma_pool *pool);

All of these functions have the same arguments and functionality as their dma_* equivalents, but they will clean up the DMA areas on device shutdown. One still has to hope that the driver has ensured that no DMA remains active on those areas, or unpleasant things could happen.

There is a managed version of pci_enable_device():

    int pcim_enable_device(struct pci_dev *pdev);

There is no pcim_disable_device(), however; code should just use pci_disable_device() as usual. A new function:

    void pcim_pin_device(struct pci_dev *pdev);

will cause the given pdev to be left enabled even after the driver detaches from it.

The patch makes the allocation of I/O memory regions with pci_request_region() managed by default - there is no pcim_ version of that interface. The higher-level allocation and mapping interfaces do have managed versions:

    void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, 
                             unsigned long maxlen);
    void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr);

For the allocation of interrupts, the managed API is:

    int devm_request_irq(struct device *dev, unsigned int irq,
		         irq_handler_t handler, unsigned long irqflags,
		     	 const char *devname, void *dev_id);
    void devm_free_irq(struct device *dev, unsigned int irq, void *dev_id);

For these functions, the addition of a struct device argument was required.

There is a new set of functions for the mapping of of I/O ports and memory:

    void __iomem *devm_ioport_map(struct device *dev, unsigned long port,
			          unsigned int nr);
    void devm_ioport_unmap(struct device *dev, void __iomem *addr);
    void __iomem *devm_ioremap(struct device *dev, unsigned long offset,
			       unsigned long size);
    void __iomem *devm_ioremap_nocache(struct device *dev, 
                                       unsigned long offset,
				       unsigned long size);
    void devm_iounmap(struct device *dev, void __iomem *addr);

Once again, these functions required the addition of a struct device argument for the managed form.

Finally, for those using the low-level resource allocation functions, the managed versions are:

    struct resource *devm_request_region(struct device *dev,
				         resource_size_t start,
					 resource_size_t n, 
					 const char *name);
    void devm_release_region(resource_size_t start, resource_size_t n);
    struct resource *devm_request_mem_region(struct device *dev,
				             resource_size_t start,
					     resource_size_t n, 
					     const char *name);
    void devm_release_mem_region(resource_size_t start, resource_size_t n);

The resource management layer includes a "group" mechanism, accessed via these functions:

    void *devres_open_group(struct device *dev, void *id, gfp_t gfp);
    void devres_close_group(struct device *dev, void *id);
    void devres_remove_group(struct device *dev, void *id);
    int devres_release_group(struct device *dev, void *id);

A group can be thought of as a marker in the list of allocations associated with a given device. Groups are created with devres_open_group(), which can be passed an id value to identify the group or NULL to have the ID generated on the fly; either way, the resulting group ID is returned. A call to devres_close_group() marks the end of a given group. Calling devres_remove_group() causes the system to forget about the given group, but does nothing with the resources allocated within the group. To remove the group and immediately free all resources allocated within that group, devres_release_group() should be used.

The group functions seem to be primarily aimed at mid-level code - the bus layers, for example. When bus code tries to attach a driver to a device, for example, it can open a group; should the driver attach fail, the group can be used to free up any resources allocated by the driver.

There are not many users of this new API in the kernel now. That may change over time as driver writers become aware of these functions, and, perhaps, as the list of managed allocation types grows. The reward for switching over to managed allocations should be more robust and simpler code as current failure and cleanup paths are removed.

Comments (15 posted)

A new Intel wireless driver

Almost exactly one year ago, Intel announced the ipw3945 project - a free driver for its 3945ABG wireless adapters. This move was welcomed as a refreshing change from the usual mode of operation in the wireless area, which usually involves binary-only drivers. Even so, this driver was greeted with some complaints; in particular, the binary-only "regulatory daemon" was not a popular idea, despite the fact that it ran entirely in user space. The ipw3945 driver was never merged into the mainline kernel.

In many cases, just getting free drivers from companies seems like a lot to ask. Getting them to go back and start over is often out of the question. That is just what Intel has done, however, and, on February 9, the new version of the driver was announced, complete with a shiny new web site. The new driver should prove more popular than the old one was.

The user-space regulatory daemon is no more. Intel's engineers, it seems, have found a way to move the regulatory function into the device's firmware, getting the host processor out of the regulatory compliance business altogether. That is probably a more robust solution in general, even though, strictly speaking, the flexibility of the hardware has been reduced. Most users will likely look at the tradeoff - better regulatory compliance and no binary-only daemon - and like what they see. Of course, those who see binary-only device firmware as an infringement of their freedom will not feel that the situation has improved much.

Another significant change is that the new driver works with the Devicescape 802.11 stack. Devicescape remains the intended direction for wireless networking in the Linux kernel, so the new driver should be more easily integrated. At least, that will be the case once Devicescape gets into the mainline. For now, Linux users wanting to try out the new driver will also have to get a version of the d80211 module (available from the Intel site) and build that for their kernels as well.

That leads to the obvious question: when will Devicescape make it into the mainline kernel? The process of getting that code ready for merging has taken rather longer than desired, but it is still moving forward. The current plan, it seems, is to rebase the Devicescape code to 2.6.21-rc1, once that's released, and get the result included in the -mm kernel. If all goes well, the Devicescape stack might just find its way into 2.6.22. That would be a major step forward for wireless networking in Linux.

Back to the Intel driver: one thing that is still lacking is any sort of hardware documentation. Anybody not working for Intel will be limited in what they can do with this driver by what they can learn from the code itself. Your editor asked Intel about hardware documentation; we were told:

The reality is the driver sources are the programming information for the hardware. As time goes forward we spend some time trying to improve the comments in the headers for the source files to make it more clear what they do and to provide some overviews of theory-of-operation, but there isn't any self-contained accurate document that covers everything you need to know to program and operate the device.

Given the choice between developing code and writing documentation, the Intel hackers went for the code.

Comments (none posted)

Clockevents and dyntick

One of the last patch sets to be merged before the 2.6.21 window closed was the clockevents and dyntick work from the real-time tree. These patches have been in the works for some time, and were originally targeted for merging in 2.6.19. In the process, the developers (primarily Ingo Molnar and Thomas Gleixner) discovered one of the fundamental laws of kernel development: if your patches break Andrew Morton's laptop, they are unlikely to make it into the mainline. That little difficulty has now been overcome, with the result that 2.6.21 will include some interesting core changes.

Dealing with clock devices has traditionally been handled in the kernel's architecture-specific code. The result has been a lot of duplicated code between architectures (there are more architectures than common timer devices) and no uniform interface for the core kernel to make use of these devices. John Stultz's generic time of day infrastructure resolved a number of those problems, at least for the timekeeping task, but anybody who wanted to program timer devices in a more general way still ended up dealing with architecture-specific code.

The "clockevents" patch set finishes this job. At its core, clockevents creates a driver API for devices which can deliver interrupts at a specific time in the future. The API tracks the capabilities of each timer (resolution and whether it can do one-shot or periodic interrupts, for example) and provides a simple interface for arming the timer. This API is defined in the core kernel, with only a low-level driver remaining in the architecture-specific code. The end result is that the kernel now has the means to query and use timer capabilities in an architecture-independent manner.

With the clockevents mechanism in place, it becomes possible to support truly high-resolution timers. When such a timer is requested, all that is required is to pick a suitable clockevent device and arm it for the desired time. These devices can deliver interrupts with a high degree of precision, with the result that kernel timers, too, can offer high precision - a feature which is of clear utility to real-time users (among others).

The periodic timer tick is now implemented with a clockevent as well. It does all of the things the old timer-based interrupt did - updating jiffies, accounting CPU time, etc. - but it is run out of the new infrastructure.

All of this is an improvement, but there is still one thing which could be better: there is no real need for a periodic tick in the system. That is especially true when the processor is idle. An idle CPU can save quite a bit of power, but waking that CPU up 100 times (or more) per second will hurt those power savings considerably. With a flexible timer infrastructure, there is no point in turning the CPU back on until it has something to do. So, when the (i386) kernel goes into its idle loop, it checks the next pending timer event. If that event is further away than the next tick, the periodic tick is turned off altogether; instead, the timer is is programmed to fire when the next event comes due. The CPU can then rest unharrassed until that time - unless an interrupt comes in first. Once the processor goes out of the idle state, the periodic tick is restored.

What's in 2.6.21 is, thus, not a full dynamic tick implementation. Eliminating the tick during idle times is a good step forward, but there is value in getting rid of the tick while the system is running as well - especially on virtualized systems which may be sharing a host with quite a few other clients. The dynamic tick documentation file suggests that the developers have this goal in mind:

The implementation leaves room for further development like full tickless systems, where the time slice is controlled by the scheduler, variable frequency profiling, and a complete removal of jiffies in the future.

So expect some interesting work in the future - the removal of jiffies alone has a number of interesting implications. The developers also have support for the x86_64 and ARM architectures, though that support has not been merged for 2.6.21; MIPS and PowerPC support is in the works as well.

Comments (4 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 2.6.21-rc1 Feb 21
Greg KH Linux 2.6.20.1 Feb 20
Andrew Morton 2.6.20-mm1 Feb 15
Andrew Morton 2.6.20-mm2 Feb 19
Con Kolivas 2.6.20-ck1 Feb 17
Greg KH Linux 2.6.19.4 Feb 20
Greg KH Linux 2.6.18.7 Feb 20
Adrian Bunk Linux 2.6.16.41 Feb 19
Adrian Bunk Linux 2.6.16.41-rc1 Feb 17

Architecture-specific

Jeff Dike UML utrace support, step 1 Feb 21

Core kernel code

Development tools

Junio C Hamano GIT 1.5.0.1 Feb 19
Josef Sipek Guilt v0.20 Feb 21

Device drivers

Documentation

Filesystems and block I/O

Artem Bityutskiy [UBI] Unsorted Block Images Feb 19

Janitorial

Jeff Garzik remove JFFS v1 Feb 19
Adrian Bunk the scheduled eepro100 removal Feb 20

Memory management

Networking

Angelo P. Castellani YeAH-TCP: algorithm implementation Feb 19

Security-related

Virtualization and containers

Miscellaneous

Ian Kent autofs 5.0.1 release Feb 20

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds