Brief items
The current development kernel is 2.6.0-test3; no development
kernels have been released over the last week.
-test4 must be getting closer, however; Linus's BitKeeper tree includes
several hundred patches, including numerous networking fixes, a new
free_netdev() method for networking drivers, a new
cpumask_t type for systems with more processors than bits in a
long integer, a CONFIG_BROKEN option to control access to drivers
known to be broken, a magic, fast new
strncpy() implementation, the addition of wireless statistics
to sysfs, Twofish and Serpent support for IPSec, the beginnings of Patrick
Mochel's power management merge, new sysfs attributes to control scanning
of SCSI devices, a number of IDE patches, a new sysfs "attribute group"
mechanism which enables the addition of attributes in a safer way and with
less boilerplate code, and a mind-numbing array of other fixes and updates.
The current stable kernel is 2.4.21; Marcelo has not released any
2.4.22 release candidates since 2.4.22-rc2
on August 8.
Comments (5 posted)
Kernel development news
Alan Cox, a crucial figure in kernel development since, almost, the
beginning, has announced his
intention to take a one-year sabbatical from Red Hat and "vanish" from
kernel development. He has, apparently, decided to don a tie and go back
to school for an MBA. "
A few years ago I'd have worried about doing this, the great thing is
that with the kernel community we have today I know I'm not a critical
cog in the machine. In fact I'm surrounded by people far better than I
am and we even have Andrew Morton to keep Linus in check 8)" He'll
be around until the end of September. Most of his current projects have
been dropped or passed on, but there
is an opportunity for somebody
who would like to maintain the 2.2 kernel...
Full Story (comments: 10)
Russell King recently posted
a patch which
makes Linus's kernel tree build properly for the ARM architecture. One of
the remaining issues, it seems, was getting
/proc/kcore to work.
/proc/kcore, of course, is a virtual file which appears to be a
core image of the running kernel. It can be used to run debuggers on a
running kernel to dump out data structures and such.
The problem with /proc/kcore is that it has to handle loadable
modules, which are placed in address space that is separate from the rest
of the kernel. Providing user-space access to that space is easier on some
architectures than others. ARM, it seems, is one of the harder
architectures to support. So, rather than put in large amounts of effort
to produce an ugly solution, Russell simply threw in the towel and decreed
that /proc/kcore would not be supported on ARM - at least, in the
absence of a volunteer to take on the work.
Linus responded by suggesting that
/proc/kcore be removed for all architectures.
Does anybody actually _use_ /proc/kcore? It was one of those "cool
feature" things, but I certainly haven't ever used it myself except
for testing, and it's historically often been broken after various
kernel infrastructure updates, and people haven't complained..
There were a couple of followups from people who occasionally use it, but a
notable lack of impassioned defenses for /proc/kcore. The biggest
problem, perhaps, is that OProfile uses that file for some information, but
there suggestions for small changes in how OProfile works to get around
that problem. Unless somebody comes up with a stronger argument soon,
/proc/kcore is likely to be history.
Comments (none posted)
Non-Uniform Memory Access (NUMA) systems have the interesting feature that
access times to memory vary from one node (group of one or more processors)
to another. Each node has local memory, which is relatively fast, but
access to another node's memory will be slower. So performance work on
NUMA systems tends to emphasize getting rid of cross-node memory traffic.
The latest step in that direction is this
patch from Dave Hansen. Dave notes that one source of cross-node
traffic is shared user text - things like shared libraries and executible
images. Once a particular page from, say, glibc has been faulted into
memory, it will exist in a particular node's range. Every other node will
have to reach across the system to run code out of that page (though
processor caches also figure into this picture, of course). In some cases,
such as with the C library, it may well make sense to make a local copy of
each page as needed.
To that end, Dave's patch makes some fundamental changes to the kernel's
page cache. This change is required, since the cache can now contain more
than one memory page for each corresponding file page. So the page cache
now contains a set of page_cache_leaf structures, the main
component of which is a per-node array of struct page pointers. A
page cache lookup will preferentially return a node-local copy of the page
if it exists; depending on the situation, it can return a page on a remote
node if that's all that is available.
When the kernel handles a page fault for a mapped text page, it insists on
a local copy of the page. If no such copy exists, and memory is available,
a local copy will be made and added to the page cache. The processor then
continues with its work, using the local version of the shared page. The
results, from a set of quick benchmarks posted with the patch, is a
performance improvement of 109% to 143%. In other words, it may well be
worth the trouble.
This patch is not quite ready for prime time, however; Dave notes:
This is still pretty experimental, so don't give it to your bank or
anything. I've lightly corrupted data playing with it, although
not in at least a week :)
The current code punts on a couple of important issues. When a process
tried to write to a file with replicated pages, for example, those pages
must be collapsed down to a single copy before the write can be allowed -
or inconsistent copies will result. Similarly, if the last writer closes a
file, that file suddenly becomes a candidate for replication. The patch,
as posted, detects these situations but does not fully implement their
resolution. A
production-ready patch would also certainly have a mechanism for freeing
replicated pages when memory gets tight. Given that this patch is clearly
not 2.6 material, however, Dave has a long time to work out those details.
Comments (3 posted)
One of the longer-running current discussions on linux-kernel (and
linux-net,
and netdev) was started on July 27, when Bas
Bloemsaat
pointed out a problem that he was
having. The Linux implementation of ARP, it seems, it not working as he
would like.
ARP, the Address Resolution Protocol, is the means by which IP addresses
are translated to physical layer MAC (usually ethernet) addresses. ARP
makes local area networks work by enabling systems to find each other.
When one system has a packet to transmit to another on the local network,
it broadcasts an ARP request packet seeking a MAC address for a given IP
address. Some machine (usually the intended recipient) hopefully responds
with the corresponding MAC address, and the packet gets sent.
If a Linux system (with a default configuration) receives an ARP request on
one of its interfaces, and that request is looking for an IP address
assigned to any of the systems interfaces, the system will respond to the
ARP request through the interface that received it. This response
happens even if the interface involved is not the one to which the
requested address has been assigned. Mr. Bloemsaat's problems came about
because his system has two interfaces plugged into the same network. Both
interfaces receive - and respond to - ARP requests sent on that network.
Depending on the order in which the responses are sent, traffic could be
directed to the wrong interface.
Mr. Bloemsaat included a patch which restricts ARP responses to the
interface actually implementing the requested address. But, over almost a
month of discussion, the networking hackers have made it clear that they do
not intend to change the way Linux behaves. Their reasoning follows, more
or less, these lines:
- Blocking ARP responses in this way is putting filtering decisions
at the wrong layer of the networking code. This sort of action
belongs at the netfilter level, rather than down at the device level.
- Linux's approach to ARP responses is fully compliant with all
applicable RFCs.
- In some situations, responding out of all interfaces is the only way
to successfully get communication established.
- For situations where the default ARP behavior causes problems, the
arp_filter sysctl knob can be used to change things. This
knob is described in networking/ip-sysctl.txt in the
kernel documentation directory. For those who do not want to do this
sort of tweaking directly, the ebtables package presents
an easier interface.
A lot of the confusion, it seems, comes down to a subtle difference in how
systems handle IP addresses. Many (perhaps most) networking
implementations treat addresses as "belonging" to the interfaces they are
assigned to. With that view of the world, no network interface has any
business responding to an ARP request for an address which is assigned
elsewhere. Linux, instead, sees IP addresses as a property of the system
as a whole. So it makes sense for an interface to respond to a request for
one of the system's addresses, even if that address is normally associated
with a different interface.
The networking RFCs make it clear that either view of IP addresses is
legitimate. Armed with that, and their sense of how things should work,
the networking hackers are determined to keep Linux's ARP behavior as it
is.
Comments (7 posted)
Patches and updates
Kernel trees
Build system
Core kernel code
- Con Kolivas: O16.2int.
(August 16, 2003)
- Con Kolivas: O17int.
(August 19, 2003)
Development tools
Device drivers
Janitorial
Memory management
Networking
Architecture-specific
Page editor: Jonathan Corbet
Next page: Distributions>>