LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 2.6.0-test3; no development kernels have been released over the last week.

-test4 must be getting closer, however; Linus's BitKeeper tree includes several hundred patches, including numerous networking fixes, a new free_netdev() method for networking drivers, a new cpumask_t type for systems with more processors than bits in a long integer, a CONFIG_BROKEN option to control access to drivers known to be broken, a magic, fast new strncpy() implementation, the addition of wireless statistics to sysfs, Twofish and Serpent support for IPSec, the beginnings of Patrick Mochel's power management merge, new sysfs attributes to control scanning of SCSI devices, a number of IDE patches, a new sysfs "attribute group" mechanism which enables the addition of attributes in a safer way and with less boilerplate code, and a mind-numbing array of other fixes and updates.

The current stable kernel is 2.4.21; Marcelo has not released any 2.4.22 release candidates since 2.4.22-rc2 on August 8.

Comments (5 posted)

Kernel development news

Alan Cox goes on sabbatical

Alan Cox, a crucial figure in kernel development since, almost, the beginning, has announced his intention to take a one-year sabbatical from Red Hat and "vanish" from kernel development. He has, apparently, decided to don a tie and go back to school for an MBA. "A few years ago I'd have worried about doing this, the great thing is that with the kernel community we have today I know I'm not a critical cog in the machine. In fact I'm surrounded by people far better than I am and we even have Andrew Morton to keep Linus in check 8)" He'll be around until the end of September. Most of his current projects have been dropped or passed on, but there is an opportunity for somebody who would like to maintain the 2.2 kernel...

Full Story (comments: 10)

The end of /proc/kcore?

Russell King recently posted a patch which makes Linus's kernel tree build properly for the ARM architecture. One of the remaining issues, it seems, was getting /proc/kcore to work. /proc/kcore, of course, is a virtual file which appears to be a core image of the running kernel. It can be used to run debuggers on a running kernel to dump out data structures and such.

The problem with /proc/kcore is that it has to handle loadable modules, which are placed in address space that is separate from the rest of the kernel. Providing user-space access to that space is easier on some architectures than others. ARM, it seems, is one of the harder architectures to support. So, rather than put in large amounts of effort to produce an ugly solution, Russell simply threw in the towel and decreed that /proc/kcore would not be supported on ARM - at least, in the absence of a volunteer to take on the work.

Linus responded by suggesting that /proc/kcore be removed for all architectures.

Does anybody actually _use_ /proc/kcore? It was one of those "cool feature" things, but I certainly haven't ever used it myself except for testing, and it's historically often been broken after various kernel infrastructure updates, and people haven't complained..

There were a couple of followups from people who occasionally use it, but a notable lack of impassioned defenses for /proc/kcore. The biggest problem, perhaps, is that OProfile uses that file for some information, but there suggestions for small changes in how OProfile works to get around that problem. Unless somebody comes up with a stronger argument soon, /proc/kcore is likely to be history.

Comments (none posted)

User-data replication on NUMA systems

Non-Uniform Memory Access (NUMA) systems have the interesting feature that access times to memory vary from one node (group of one or more processors) to another. Each node has local memory, which is relatively fast, but access to another node's memory will be slower. So performance work on NUMA systems tends to emphasize getting rid of cross-node memory traffic.

The latest step in that direction is this patch from Dave Hansen. Dave notes that one source of cross-node traffic is shared user text - things like shared libraries and executible images. Once a particular page from, say, glibc has been faulted into memory, it will exist in a particular node's range. Every other node will have to reach across the system to run code out of that page (though processor caches also figure into this picture, of course). In some cases, such as with the C library, it may well make sense to make a local copy of each page as needed.

To that end, Dave's patch makes some fundamental changes to the kernel's page cache. This change is required, since the cache can now contain more than one memory page for each corresponding file page. So the page cache now contains a set of page_cache_leaf structures, the main component of which is a per-node array of struct page pointers. A page cache lookup will preferentially return a node-local copy of the page if it exists; depending on the situation, it can return a page on a remote node if that's all that is available.

When the kernel handles a page fault for a mapped text page, it insists on a local copy of the page. If no such copy exists, and memory is available, a local copy will be made and added to the page cache. The processor then continues with its work, using the local version of the shared page. The results, from a set of quick benchmarks posted with the patch, is a performance improvement of 109% to 143%. In other words, it may well be worth the trouble.

This patch is not quite ready for prime time, however; Dave notes:

This is still pretty experimental, so don't give it to your bank or anything. I've lightly corrupted data playing with it, although not in at least a week :)

The current code punts on a couple of important issues. When a process tried to write to a file with replicated pages, for example, those pages must be collapsed down to a single copy before the write can be allowed - or inconsistent copies will result. Similarly, if the last writer closes a file, that file suddenly becomes a candidate for replication. The patch, as posted, detects these situations but does not fully implement their resolution. A production-ready patch would also certainly have a mechanism for freeing replicated pages when memory gets tight. Given that this patch is clearly not 2.6 material, however, Dave has a long time to work out those details.

Comments (3 posted)

Harping on ARP

One of the longer-running current discussions on linux-kernel (and linux-net, and netdev) was started on July 27, when Bas Bloemsaat pointed out a problem that he was having. The Linux implementation of ARP, it seems, it not working as he would like.

ARP, the Address Resolution Protocol, is the means by which IP addresses are translated to physical layer MAC (usually ethernet) addresses. ARP makes local area networks work by enabling systems to find each other. When one system has a packet to transmit to another on the local network, it broadcasts an ARP request packet seeking a MAC address for a given IP address. Some machine (usually the intended recipient) hopefully responds with the corresponding MAC address, and the packet gets sent.

If a Linux system (with a default configuration) receives an ARP request on one of its interfaces, and that request is looking for an IP address assigned to any of the systems interfaces, the system will respond to the ARP request through the interface that received it. This response happens even if the interface involved is not the one to which the requested address has been assigned. Mr. Bloemsaat's problems came about because his system has two interfaces plugged into the same network. Both interfaces receive - and respond to - ARP requests sent on that network. Depending on the order in which the responses are sent, traffic could be directed to the wrong interface.

Mr. Bloemsaat included a patch which restricts ARP responses to the interface actually implementing the requested address. But, over almost a month of discussion, the networking hackers have made it clear that they do not intend to change the way Linux behaves. Their reasoning follows, more or less, these lines:

  • Blocking ARP responses in this way is putting filtering decisions at the wrong layer of the networking code. This sort of action belongs at the netfilter level, rather than down at the device level.

  • Linux's approach to ARP responses is fully compliant with all applicable RFCs.

  • In some situations, responding out of all interfaces is the only way to successfully get communication established.

  • For situations where the default ARP behavior causes problems, the arp_filter sysctl knob can be used to change things. This knob is described in networking/ip-sysctl.txt in the kernel documentation directory. For those who do not want to do this sort of tweaking directly, the ebtables package presents an easier interface.

A lot of the confusion, it seems, comes down to a subtle difference in how systems handle IP addresses. Many (perhaps most) networking implementations treat addresses as "belonging" to the interfaces they are assigned to. With that view of the world, no network interface has any business responding to an ARP request for an address which is assigned elsewhere. Linux, instead, sees IP addresses as a property of the system as a whole. So it makes sense for an interface to respond to a request for one of the system's addresses, even if that address is normally associated with a different interface.

The networking RFCs make it clear that either view of IP addresses is legitimate. Armed with that, and their sense of how things should work, the networking hackers are determined to keep Linux's ARP behavior as it is.

Comments (7 posted)

Patches and updates

Kernel trees

Build system

Core kernel code

  • Con Kolivas: O16.2int. (August 16, 2003)
  • Con Kolivas: O17int. (August 19, 2003)

Development tools

Device drivers

Janitorial

Memory management

Networking

Architecture-specific

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds