Brief items
The current 2.6 prepatch is 2.6.12-rc6,
released by Linus on
June 6. This one
should, if all goes well, be the final testing release before 2.6.12 comes
out. Most of the patches are basic fixes, but there is also the
(temporary, hopefully) removal of the Philips webcam decompression code,
the conversion of the IDE code over to the device model way of doing
things, a CPU frequency controller update, and a user-mode Linux update.
See
the long-format changelog for the
details.
Linus's git repository has since accumulated a few dozen small fixes.
The current -mm tree is 2.6.12-rc6-mm1.
Recent additions to -mm include semi-persistent permissions for sysfs
files, the "scalable TCP" congestion control algorithm, hotplug CPU support
for the x86_64 architecture, RapidIO support (see below), an NFS update,
an unlocked_ioctl() operation for block devices,
and the v9fs filesystem (covered here last
month).
Comments (none posted)
Kernel development news
My things-to-worry-about folder still has 244 entries. Nobody
seems to care much. Poor me.
--
Andrew Morton
This is the kind of crap that happens when drivers in the kernel
are not self contained, and need "external stuff" to work properly.
It means that simple things like NFS root over the device do not
work in a straightforward, simple, and elegant manner.
I am likely to always take the position that device firmware
belongs in the kernel proper, not via these userland and filesystem
loading mechanism, none of which may be even _available_ when
we first need to get the device going.
--
David Miller
Comments (4 posted)
Paul McKenney has taken some time and written up a detailed summary of the
current status of Linux realtime support. The resulting document (click
below) starts with a discussion of the problem, then works through the
various approaches being taken to provide realtime response with Linux.
Worth a read if you have any interest in this area.
Full Story (comments: 5)
The timer interrupt is one of the most predictable events on a Linux
system. Like a heartbeat, it pokes the kernel every so often (about every
1ms on most systems), enabling the kernel to note the passage of time, run
internal timers, etc. Most of the time, the timer interrupt handler just
does its job and nobody really notices.
There are times, however, when this interrupt can be unwelcome. Many
processors, when idle, can go into a low-power state until some work comes
along. To such processors, the timer interrupt looks like work. If there
is nothing which actually needs to be done, however, then the processor
might be powering up 1000 times per second for no real purpose. Timer
interrupts can also be an issue on virtualized systems; if a system is
hosting dozens of Linux instances simultaneously, the combined load from
each instance's timer interrupt can add up to a substantial amount of
work. So it has often been thought that there would be a benefit to
turning off the timer interrupt when there is nothing for the system to do.
Tony Lindgren's dynamic tick patch is
another attempt to put a lid on the timer interrupt. This version of the
patch only works on the i386 architecture, but it is simple enough that
porting it to other platforms should not be particularly difficult.
The core of the patch is a hook into the architecture-specific
cpu_idle() function. If a processor has run out of work and is
about to go idle, it first makes a call to
dyn_tick_reprogram_timer(). That function checks to see whether
all other processors on the system are idle; if at least one processor
remains busy, the timer interrupt continues as always. Experience has
shown that trying to play games with the timer interrupt while the system
is loaded leads to a net loss in performance - the overhead of reprogramming
the clock outweighs the savings. So, if the system is working, no changes
are made to the timer.
If, instead, all CPUs on the system are idle, there may be an opportunity
to shut down the timer interrupt for a while. When the system goes idle,
there are only two events which can create new work to do: the completion
of an I/O operation or the expiration of an internal kernel timer. The
dynamic tick code looks at when the next internal timer is set to go off,
and figures it might be able to get away with turning off the hardware
timer interrupt until then. After applying some tests (there are minimum
and maximum allowable numbers of interrupts to skip), the code reprograms
the hardware clock to interrupt after this time period, and puts the
processor to sleep.
At some point in the future, an interrupt will come along and wake the
processor. It might be the clock interrupt which had been requested
before, or it could be some other device - a keyboard or network interface,
for example. The dynamic tick code hooks into the main interrupt handler,
causing its own handler to be invoked for every interrupt on the system,
regardless of source. This code will figure out how many clock interrupts
were actually skipped, then loop calling do_timer_interrupt()
until it catches up with the current time. Finally, the interrupt handler
restores the regular timer interrupt, and the system continues as usual.
The end result is a system which can drop down to about 6 timer interrupts
per second when nothing is going on. That should eventually translate into
welcome news for laptop users and virtual hosters running Linux.
Comments (7 posted)
One of the patch sets which showed up in the 2.6.12-rc6-mm1 kernel is the
RapidIO subsystem, contributed by Matt
Porter (of Montavista). Your editor, being ignorant of the
RapidIO standard, decided to have a look.
RapidIO turns out to be a sort of backplane interconnect intended mainly
for embedded systems. It allows for multiple hosts to exist on the same
bus and work collaboratively with the available peripherals. It is a sort
of highly local area network.
The RapidIO site provides no end of highly detailed specifications for the
truly curious. The rest of us, however, can learn a lot by looking at a network driver packaged with the rest of the
Linux RapidIO patch. This driver provides a simple example of how to use
the API provided by the RapidIO layer; it enables network packets to be
exchanged with another host on the RapidIO bus.
The RapidIO subsystem is integrated with the device model, so it provides
the expected structures: rio_dev and rio_driver.
Drivers can register a probe() function which enables them to take
responsibility for devices (which can be other hosts) as they turn up on
the interconnect. The example network driver uses a wildcard ID table so
that it is given the opportunity to work with all other devices out there;
it will happily send packets to any suitably capable device.
"Suitably capable," in this case, means that the device implements the two
basic primitives used to communicate across the RapidIO interconnect.
"Doorbells" are a way of sending simple, out-of-band signals to remote
nodes; the doorbells used by the network driver are those which announce
device addition and removal events. Most work, however, is done with
"mailboxes," essentially a reliable packet delivery service. If one
RapidIO device sends a message to another via a mailbox, the lower levels
will do their best to ensure that the message arrives uncorrupted and in
the right order.
So how does one RapidIO network node send a packet to another? Taking out
the usual overhead and error handling, it comes down to the following:
static int rionet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
{
struct rionet_private *rnet = ndev->priv;
rio_add_outb_message(rnet->mport, rdev, 0, skb->data, skb->len);
}
rdev is a rio_dev structure corresponding to the
destination host on the RapidIO backplane. This call sends the data in the
network packet (skb) out through the given mailbox to the desired
device. When the transmission is
complete, the driver will receive a callback so that it can perform any
necessary cleanup (freeing the skb in this case).
Packet reception requires setting up a ring of receive buffers, much like
one would see in any network driver. In this case, the necessary code
looks like:
do {
rnet->rx_skb[i] = dev_alloc_skb(RIO_MAX_MSG_SIZE);
if (!rnet->rx_skb[i])
break;
rio_add_inb_buffer(rnet->mport, RIONET_MAILBOX,
rnet->rx_skb[i]->data);
} while ((i = (i + 1) % RIONET_RX_RING_SIZE) != end);
The RapidIO subsystem maintains a list of buffers waiting for incoming
mailbox messages; new buffers are added with
rio_add_inb_buffer(). When a message actually shows up, the
driver gets a callback (established when the mailbox is allocated), which,
in the end, does the following:
if (!(data = rio_get_inb_message(rnet->mport, RIONET_MAILBOX)))
break;
rnet->rx_skb[i]->data = data;
skb_put(rnet->rx_skb[i], RIO_MAX_MSG_SIZE);
error = netif_rx(rnet->rx_skb[i]);
The code assumes that anything arriving on the given mailbox will be a
network packet. Beyond that, little checking is required; all of the
details, including data integrity checks, will have been taken care of by
the lower levels.
The list of RapidIO-capable devices is small at the moment, but appears to
be growing. As these devices become available, Linux will have the
low-level infrastructure needed to support them. The embedded Linux
community has often been accused of keeping its work to itself and not
contributing back to the kernel as a whole. The contribution of the
RapidIO subsystem is another sign that this situation may be changing;
that, perhaps, is more welcome than the code itself.
Comments (none posted)
If there is one thing that almost all kernel developers agree with, it's
that more testing is a good thing - especially if the results are presented
in a useful way. Martin Bligh thus got a warm reception when he
announced a new kernel testing facility. As
he put it:
Currently it builds and boots any mainline, -mjb, -mm kernel within
about 15 minutes of release. runs dbench, tbench, kernbench, reaim
and fsx. Currently I'm using a 4x AMD64 box, a 16x NUMA-Q, 4x
NUMA-Q, 32x x440 (ia32) PPC64 Power 5 LPAR, PPC64 Power 4 LPAR, and
PPC64 Power 4 bare metal system.
This is, indeed, a fairly wide range of coverage. The results
are presented as a simple table, showing which kernels passed the tests and
which did not. When a kernel fails a test, the relevant information is
provided (though, often, that information is simply "did not boot," which
is not entirely helpful).
These results have been augmented with benchmark
results, presented in a handy graphic form. The graph shown on the
right, for example, notes that kernbench performance improved significantly
around 2.6.6, and has held steady since 2.6.10. The -mm trees, however,
perform notably worse than the mainline, and the difference between the two
has been growing. The results have already led to some investigation into
what is going on; the current suspect is the (36!) scheduler patches
currently living in -mm.
Numerous others have worked at testing and benchmarking kernel releases.
Martin's work, however, has the advantages of being automated and
presenting the results in a reasonable way. With these attributes, this
project stands a good chance of helping the developers to produce better
kernels in the near future.
Comments (6 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Janitorial
Memory management
Networking
Architecture-specific
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>