Kernel development [LWN.net]

Kernel release status

The current stable 2.6 kernel is 2.6.14.5, released on December 26. It contains the usual set of fixes, mostly in the networking and SCSI subsystems.

The current 2.6 kernel is 2.6.15, announced by Linus on January 2. The changelog entry for the release says "Hey, it's fifteen years today since I bought the machine that got Linux started. January 2nd is a good date." This release contains a fair number of fixes since -rc7, but no big changes. The 2.6.15 series as a whole has added a big set of 802.11 improvements, hotplug memory support, much-improved NTFS support, much-improved CIFS support, the open-iSCSI initiator, shared subtrees, a new, IPv6-capable netfilter connection tracking implementation, and much more. The long-format changelog has the details. See also LWN's Kernel Page coverage of features as they were added (here and here) and the KernelNewbies Linux Changes Wiki.

The floodgates have not yet opened for the 2.6.16 development cycle, so there is no pile of pending patches in the mainline git repository as of this writing. There have also been no -mm kernel releases since December 14.

The current 2.4 prepatch is 2.4.33-pre1; Marcelo launched the 2.4.33 cycle on December 29. This prepatch includes some security fixes, some networking work, and, it is said, the last ever big SATA update for 2.4.

Comments (3 posted)

Quote of the week

The problem is that if we allow the release to be stalled by bugs it allows one sluggish maintainer to block the entire kernel. At some point in time we do need to just give up and hope that the bug will get fixed in 2.6.x.y or that it'll just magically fix itself later on (this happens, for various reasons).

We get in the situation where lots of people are sitting there with arms folded, complaining about lack of a new kernel release while nobody is actually working on the bugs. Nobody knows why this happens.

-- Andrew Morton

Actually "sprinkling with penguin pee" means that something is blessed (it's like a kernel baptism). Maybe that's not very civilized, but hey, penguins don't have thumbs, and are thus kind of limited in their actions. Don't be speciest.

-- Linus Torvalds

Comments (none posted)

A summary of 2.6.15 API changes

The 2.6.15 kernel is out. The following is a summary of changes to the internal kernel API found in this release, with an emphasis on changes visible to driver writers. This information will be folded into the LWN 2.6 API changes page shortly.

The nested class device patch was merged, allowing class_device structures to have other class_devices as parents. This patch is a hack to make the input subsystem work with sysfs. This code will change again in the future; see Greg Kroah-Hartman's article for more information on what is planned.
The prototypes for the driver model class "interface" methods add() and remove() have changed; there is now a new parameter pointing to the relevant interface structure.
A new platform_driver structure has been added to describe drivers for devices built into the core "platform."
The prototypes for the suspend() and resume() methods in struct device_driver have changed. They are also only called once per event, rather than three times as in previous kernels.
Two new fields have been added to the device_pm_info which control how drivers should act on hardware-created wakeup events; see this article for details.
There is a notification mechanism which lets interested modules know when a USB device is added to (or removed from) the system. This system is used by some core code; drivers do not normally need to hook in to it.
The gfp_t type is now used throughout the kernel. If you have a function which takes memory allocation flags, it should probably be using this type.
Code using reader/writer semaphores can now use rwsem_is_locked() to test the (read) state of the semaphore without blocking.
The new vmalloc_node() function allocates memory on a specific NUMA node.
The "reserved" bit for memory pages has, for all practical purposes, been removed.
vm_insert_page() has been added to make it easier for drivers to remap RAM into user space VMAs.
There is a new kthread_stop_sem() function which can be used to stop a kernel thread which might be currently blocked on a specific semaphore.
RapidIO bus support has been merged into the mainline.
The netlink connector mechanism makes netlink code easier to write. Independently, a type-safe netlink interface has been added and is used in parts of the networking subsystem.
These kernel symbols have been unexported and are no longer available to modules: clear_page_dirty_for_io, console_unblank, cpu_core_id hugetlb_total_pages, idle_cpu, nr_swap_pages, phys_proc_id, reprogram_timer, swapper_space, sysctl_overcommit_memory, sysctl_overcommit_ratio, sysctl_max_map_count, total_swap_pages, user_get_super, uts_sem, vm_acct_memory, and vm_committed_space.
Version 1 of the Video4Linux API is now officially scheduled for removal in July, 2006.
The owner field has been removed from the pci_driver structure.
A number of SCSI subsystem typedefs (Scsi_Device, Scsi_Pointer, and Scsi_Host_Template) have been removed.
The DMA32 memory zone has been added to the x86-64 architecture; its purpose is to make it easy to allocate memory below the 4GB barrier (with the new GFP_DMA32 flag).
A call to rcu_barrier() will block the calling process until all current RCU callbacks have completed.

As can be seen from this list, the kernel API continues to evolve. The claims of certain well-known maintainers notwithstanding, it doesn't look like things will slow down much anytime soon.

Comments (2 posted)

Drawing the line on inline

Kernel programmers tend to like inline functions. They resemble C macros, in that they result in code inserted directly into the calling function, with no added function call overhead. But, unlike macros, they offer type checking and the ability to include multiple lines of code without adding a pile of backslashes. In cases where a function is optimized out entirely, an inline function turns into no code at all - a level of efficiency which is hard to beat. And, in some cases, inlining is required; consider, for example, functions which embody special assembly instructions needed by the kernel.

Inline functions also have their costs, however. Their code is duplicated for every call, so inline functions which are called from more than one place make the kernel larger. Increasingly, developers are becoming aware that this size increase carries a performance penalty. As the gap between CPU and memory speeds grows, cache behavior increasingly determines how fast a program runs. So the performance benefits of inline functions are often, at best, illusory, and sometimes negative; a larger kernel will be a slower kernel.

Ingo Molnar recently raised this issue with a set of patches changing how the kernel is built. By turning on unit-at-a-time compilation (which causes gcc to consider an entire file in its optimization decisions) and by turning off forced inlining, he was able to achieve a 5.3% size reduction. Taking things to an extreme, and applying these patches to an "allyesconfig" kernel (one with all configuration options turned on) results in a nearly 25% smaller kernel. That is, to say the least, a significant size reduction to be achieved by such a small patch. Anybody interested in de-bloating the kernel should be paying attention.

These patches have not been accepted by everybody, however. In particular, the turning off of forced inlining is controversial. When gcc is not forced to honor the inline keyword, it makes its own decisions, based on the size of the function and how many times it is called. When told to optimize for size, in particular, gcc will have a strong bias against inline functions. This approach yields a significant size reduction, but there is a problem: Linus doesn't trust the gcc maintainers to code consistent and correct inline heuristics, and Andrew Morton doesn't either. Rather than turning off forced inlining and letting gcc figure things out, they would rather go through the code and remove unnecessary inline declarations one by one.

It is true that the kernel has been burned by changes to how gcc handles inline in the past. Since then, gcc seems to have gotten smarter, and one can argue that its maintainers have become more aware of the issues. There is also the little fact that cleaning up the existing inline declarations is not a small job; Ingo says:

There are 22,000+ inline functions in the kernel right now (inlined about a 100,000 times), and we'd have to change _thousands_ of them. They are causing an unjustified code bloat of somewhere around 20-30%.

Arjan van de Ven adds:

The reality is, most driver writers (in fact kernel code writers) tend to overestimate the gain of inline in THEIR code, and to underestimate the cumulative cost of it. Despite what akpm says, I think gcc can make a better judgement than most of these authors (probably including me :). We can remove 6400 now, but a year from now, another 1000 have been added back again I bet.

How all of this will turn out is unclear. Certainly one can expect a higher level of resistance to patches adding inline functions in the future. There is likely to be a long flurry of de-inlining patches as well. The ability to turn off forced inlining might be added to the build system as an experimental option; some distributors may even decide to use this option for the kernels they ship. But enough developers seem uncomfortable with the idea of turning off forced inlining wholesale that this option may not get beyond the "experimental" stage for some time.

Comments (10 posted)

Goodbye semaphores?

In the previous episode, Ingo Molnar had posted his own version of the mutex patch, adding a new synchronization primitive to the kernel. Ingo has continued to refine this patch set, with frequent releases; the current version is ~~V10~~ ~~V11~~ ~~V12~~ ~~V13~~ V14. This patch set has faced ongoing resistance from Andrew Morton, who didn't see the reasons for adding a new mutual exclusion mechanism to the kernel. Andrew, instead, wished that the developers would concentrate on fixing the problems with the current semaphore code.

Perhaps the most significant development since then has been a private conversation between Andrew and Ingo. There is, it seems, a plan in place which would replace the current semaphore implementation entirely. Almost all current semaphore users are implementing simple mutual exclusion areas, so they would be converted over to the new mutex type directly. An estimated 90% of current semaphore users fall into this category. Of the remaining users, about 90% employ semaphores to indicate event completion. The task of converting those users to the completion type has been ongoing for some time; replacing semaphores would require finishing this job. Finally, an estimated 1% of the semaphores in the kernel are used for their counting feature; they can be converted over to a (not yet posted) architecture-independent counter type.

Once all that work is done, semaphores could be removed from the kernel altogether. Says Andrew: "It's a lot of churn, but we'll end up with a better end result and a somewhat-net-simpler kernel, so I'm happy." Linus, meanwhile, has offered some suggestions for improvements (already incorporated by Ingo) and stated: "At that point I'd like to switch to mutexes just because the code is cleaner!"

Since then, most of the discussion has been concerned with the details of the mutex implementation rather than whether it is fundamentally a good idea or not. The main objections would appear to have been overcome. So, unless something new comes up, it looks like this change is going to happen; the only question is "when." The next couple of weeks will determine whether the mutex code will be part of 2.6.16 or not. Then all that's left is the long task of converting all semaphore users over and, finally, removing the old semaphore code.

Comments (11 posted)

Linus Torvalds Linux 2.6.15 ?

Ingo Molnar 2.6.15-rt1 ?

Con Kolivas 2.6.15-ck1 ?

Linus Torvalds Ho ho ho.. Linux 2.6.15-rc7 ?

Al Viro 2.6.15-rc7-bird1 ?

Ingo Molnar 2.6.15-rc7-rt1 ?

Al Viro 2.6.15-rc6-bird3 ?

Greg KH Linux 2.6.14.5 ?

Con Kolivas 2.6.14-ck8 ?

Marcelo Tosatti Linux 2.4.33-pre1 ?

Willy Tarreau Linux 2.4.32-hf32.1 ?

Sam Ravnborg kbuild + kconfig updates for 2.6.16-rc ?

Ingo Molnar improve .text size on gcc 4.0 and newer compilers ?

Roman Zippel NTP4 updates ?

Alessandro Zummo RTC subsystem ?

Bryan O'Sullivan [PATCH] Add memcpy32 function ?

Ingo Molnar mutex subsystem, -V7 ?

Ingo Molnar mutex subsystem, -V8 ?

Ingo Molnar mutex subsystem, -V9 ?

Ingo Molnar mutex subsystem, -V10 ?

Ingo Molnar mutex subsystem, -V11 ?

Ingo Molnar mutex subsystem, -V12 ?

Ingo Molnar mutex subsystem, -V13 ?

Ingo Molnar mutex subsystem, -V14 ?

Rafael J. Wysocki swsusp: userland interface ?

Peter Williams CPU scheduler: Simplified interactive bonus mechanism ?

Andi Kleen [RFC] Optimize select/poll by putting small data sets on the stack ?

Shailabh Nagar Per-task delay accounting ?

Junio C Hamano GIT 1.0.3 ?

Junio C Hamano GIT 1.0.4 and what is in master and proposed updates. ?

Junio C Hamano GIT 1.0.5 ?

Junio C Hamano GIT preformatted documentation available. ?

Catalin Marinas Stacked GIT 0.8 ?

Marco Costalba qgit-1.0rc2 ?

Marco Costalba qgit 1.0 ?

David Howells Add synchronisation primitive testing module ?

Keith Owens Announce: kdb v4.4 is available for kernel 2.6.15 ?

OGAWA Hirofumi Add new "flush" option ?

York Liu Intel PRO/Wireless 2200BG 802.11b/g Access Point Project ?

Bryan O'Sullivan Add memcpy_toio32, a 32-bit MMIO copy routine ?

James Bottomley SCSI update for 2.6.15 ?

Jeff Garzik 2.6.x net driver updates ?

Jeff Garzik 2.6.x libata updates ?

Alan Cox PATCH: Initial PATA driver for SiS chipset IDE ?

Greg KH USB patches for 2.6.15 ?

Daniel Walker RT: add back plist docs ?

Adrian Bunk the scheduled removal of obsolete OSS drivers ?

Greg KH Remove devfs from 2.6.15 ?

Peter Zijlstra vm: page-replace and clockpro ?

Christoph Lameter Fix the zone reclaim code in 2.6.15 ?

Chris Lowth Linux 2.6 support for "rope" match module ?

Evgeniy Polyakov New Year Acrypto release. ?

Anderson Lizardo Add MMC password protection (lock/unlock) support V2 ?

Douglas Gilbert lsscsi-0.16 released ?

Kernel development

Brief items

Kernel release status

Kernel development news

Quote of the week

A summary of 2.6.15 API changes

Drawing the line on inline

Goodbye semaphores?

Patches and updates

Kernel trees

Build system

Core kernel code

Development tools

Device drivers

Documentation

Janitorial

Memory management

Networking

Security-related

Miscellaneous