Brief items
The current 2.6 development kernel is 2.6.26-rc9,
released on July 5.
"
Enough changes that we needed another -rc, and the regression list
isn't emptying fast enough either (probably because a number of people,
including reporters, are vacationing)." Along with the fixes there's
a new driver for cameras which implement the standard USB video class spec.
The
long-format
changelog has the details.
A few dozen changesets have been merged since 2.6.26-rc9, as of this
writing. They are mostly fixes, but there is also a printk()
extension which allows for higher-level format string specifiers; see below
for details.
The current -mm kernel is 2.6.26-rc8-mm1. Recent changes
to -mm include the MMU notifiers patch, a new alloc_pages_exact()
function for more efficient large allocations, a lot of checkpatch.pl
tweaks, and a patch series ending with
"revert-revert-revert-revert-linux-next-revert-bootmem-add-return-value-to-reserve_bootmem_node.patch"
that one probably doesn't want to know about.
The current stable 2.6 release is 2.6.25.10, released on July 2.
"It contains a number of assorted bugfixes all over the tree. And
once again, any users of the 2.6.25 kernel series are STRONGLY encouraged
to upgrade to this release."
Comments (2 posted)
Kernel development news
The problem is that SystemTap hasn't really benefited from
community based innovation largely because it doesn't have much of
a community. The bigger picture problem Red Hat didn't see when
they accepted the cash was that this project wouldn't generate a
community just from the usual publish the code and they will come
philosophy. The result is what we see to day: a klunky and
accident prone tool that has sun engineers writing trite little
homilies on the benefits of a planned political operating system.
--
James Bottomley
Think about some of the evil perpetrated by hal and the userspace
suspend-resume scripts (and how much complexity with random XML
fragments getting parsed by various dbus plugins), and tell me with
a straight face that you would trust these modern-day desktop
application writers with this interface. Because they *will* find
some interesting way to (ab)use it.....
--
Ted Ts'o
It's readily obvious that people (ie: top-level maintainers) aren't
even compile-testing their own stuff once it's merged into
linux-next. You say (but don't provide evidence that) linux-next
is too unstable to develop against. Well guess why? Because
people are choosing to let it be that way.
--
Andrew Morton
Comments (none posted)
Not 15 minutes after David posted his note, we're now up to 11
reports; and this is only from an -mm patch series. Can you
imagine the number of bug reports if this were allowed to ship in a
mainline kernel.org release? One good thing is that we can
definitely show that there people that are downloading, compiling
and trying to build the -mm kernel. :-)
--
Ted Ts'o
External firmware is by design an error prone system, even with
versioning. But by being built and linked into the driver, it is
fool proof.
On a technical basis alone, we would never disconnect a crucial
component such as firmware, from the driver. The only thing
charging these transformations, from day one, is legal concerns.
--
David Miller
All forms of change introduce _some_ risk of breakage, of
course. In this case, as usual, I've tried to be careful to avoid
regressions. The most important part, obviously, was having a way
to build firmware into the static kernel.
When it comes to _modules_, doing that would introduce a certain
amount of complexity which just doesn't seem necessary -- if you
can load modules, then you have userspace, and you can use hotplug
for firmware too. Especially given that so many modern drivers
already _require_ you to do that, so the users understand it, and
the tools like mkinitrd already cope with it -- checking
MODULE_FIRMWARE() for the modules they include and including the
appropriate files automatically. [...]
You need to stay sober for long enough to say 'Y' when it asks you
if you want to build the required firmware into the kernel. And we
even made that the _default_ now, for the benefit of those who
can't stay sober that long. (Perhaps we'll make 'allyesconfig' the
default next?)
--
David Woodhouse
Comments (none posted)
By Jonathan Corbet
July 8, 2008
One of the development process advantages brought by git (and by BitKeeper
before it) is the ability to see the up-to-the-second, bleeding-edge status
of Linus's tree. So any developer who wants to know where the front edge
of development lies can grab that tree and make patches fit into it. But
the value of the mainline repository for development would appear to be
less than it once was. The mainline is no longer where the action is.
Consider, for example, this response
from Andrew Morton after finding that a patch posted to linux-kernel
would not compile for him:
I assume this patch was prepared against some ancient out-of-date
kernel such as current Linus mainline. Guys, we have a new
development tree now.
He followed up with this statement:
But what I am repeatedly seeing is people cheerfully raising 2.6.27
patches against the 2.6.26 tree when we have a nice 2.6.27 tree for
developing against. Those days are over, guys.
So the message would appear to be clear: development work should be done
against the linux-next tree rather than against the mainline kernel. There
are some clear advantages to having work done in this way. Patches
developed against linux-next should merge cleanly during the next merge
window. Developers will be testing each other's trees as they work,
causing bugs to turn up earlier in the process. And, of course, Andrew
won't have to complain about patches which fail to build for him - at
least, not as often.
Linux-next is a somewhat strange base on which to try to develop, though.
It is built anew every day from over 100 subsystem trees, each of which
can, itself, change from one day to the next. So linux-next is a moving
target, just like the mainline is. But, unlike the mainline, linux-next
has no consistent or coherent history. Every day's linux-next tree is a
completely new creation with a unique - and transient - history.
Consider a developer who bases some work on a mainline release -
2.6.26-rc9, say. That developer's work will be derived from a specific
commit in the mainline tree, known as
b7279469d66b55119784b8b9529c99c1955fe747 in this case. The history from
2.6.26-rc9 is well defined, and that series of patches can be merged into
any other repository which also contains 2.6.26-rc9; the identity of that
commit is consistent and immutable across all repositories. With such a
development tree, it is (relatively) easy to track the mainline as it
advances, and to merge one's work when the time comes. A git tree based on
the mainline sits on a solid foundation.
It is not possible to base a tree on linux-next in the same way.
Development can begin at a specific commit, but tomorrow's linux-next tree
may not contain that commit at all. The various component trees will have
advanced independently of the previous day's linux-next tree, which can, in
itself, complicate things. But the process of making all those trees
come together can involve tasks like moving patches from one tree to another, or
fixing intermediate patches which break things. That makes the end result
better, but at the cost of rebasing those trees. Rebasing completely
rewrites the development history, causing the old history to disappear from
the tree. So a patch series based on the previous history loses its
foundation.
And, since linux-next is built from its components every day, a patch
developed on top of linux-next may, when integrated into that tree, be
merged somewhere in the middle of the sequence; in other words, the patch
will be merged into a tree which differs considerably from the tree on
which it was developed. As Stephen Rothwell, the maintainer of the
linux-next tree, put it:
One downsides of the way linux-next works is that, because it is
recreated every day, you cannot really base anything on it that is
to be merged into it.
Another interesting aspect of linux-next development involves API changes.
The longstanding rule in kernel development is that internal kernel
interfaces can be changed if there is a good reason to do so, but that the
person making the change is obligated to fix all in-tree code broken by
that change. If an API change is introduced into linux-next, though, the
developer is simply not able to fix any code which enters linux-next by way
of the other subsystem trees. If the developer does get patches into those
trees for the API
change, they can no longer be built on top of kernels which lack that change -
the mainline, for example. API changes have, in other words, become
harder to do - a situation which some may see as a good thing.
What all this means is that API changes must be handled through techniques
like the creation of backward-compatibility layers; those layers can then
be removed a development cycle or two later once the transition is
complete. Or changes can be split up and added to individual subsystem
trees; that, however, can lead to interesting ordering dependencies between
the trees. In some cases, we are seeing 2.6.27 changes being merged into 2.6.26 in stub form as a way
of making all of the pieces fit together.
Then, there is the simple matter that developers like to have a stable base
upon which to create their code. The linux-next tree, since it contains
large amounts of relatively new code, will also contain its share of new
bugs. That makes developers, who are often having enough trouble just
tracking down their own bugs, somewhat grumpy. Development against the
mainline tends to have a lower probability of forcing developers to look
for bugs which are not of their own making.
Many of these complaints have an easy answer: the pain which comes from
making all the pieces fit together in linux-next must be faced at some
point anyway. The real difference is that linux-next allows those problems
to be dealt with at leisure, while the older "merge everything in the
mainline" model compressed much of that work into the merge window. How
beneficial that really is will be seen for the first time in the 2.6.27
merge window; if linux-next is serving its intended function, 2.6.27 should
come together with rather less hassle than its immediate predecessors did.
But, regardless of the value provided by linux-next for integration and
testing purposes, the fact remains that it is a difficult platform upon
which to develop patches. That process is somewhat like building a house
on a sand bar; overnight the tide comes in and completely reshapes the land
underneath you. That is why most (possibly all) of the subsystem trees
used to assemble linux-next are, themselves, based on the mainline.
The solution to that problem will have to evolve over time. The linux-next
tree is a new institution which is still finding its proper place in the
development process. Easier ways to develop patches against the linux-next
tree will certainly be worked out; it may well turn out that quilt-like
tools work better for this task than git. But, for now, linux-next is an
excellent integration and testing resource, but it has not quite yet
managed to become the true Linux kernel development tree.
Comments (23 posted)
By Jonathan Corbet
July 8, 2008
One of the fundamental data structures in the networking subsystem is the
transmit queue associated with each device.
The core networking code will call a driver's
hard_start_xmit() function to let the driver know that a packet is
ready for transmission; it is then the
driver's job to feed that packet into the hardware's transmit queue.
The result is a data structure which looks vaguely like this:
"Vaguely" because the list of sk_buff structures (SKBs - the
internal representation of packets) does not exist in this form within the
kernel; instead, the driver maintains the queue in a way that the hardware
can process it.
This is a scheme which has worked well for years, but it has run into a
fundamental limitation: it does not map well to devices which have multiple
transmit queues. Such devices are becoming increasingly common, especially
in the wireless networking area. Devices which implement the Wireless
Multimedia Extensions, for example, can have four different classes of
service: video, voice, best-effort, and background. Video and voice
traffic may receive higher priority within the device - it is
transmitted first - and the device can also take more of the available air
time for such packets. On the other hand, the queues for this kind of traffic may
be relatively short; if a video packet doesn't get sent on its way quickly,
the receiving end will lose interest and move on. So it might be better to just
drop video packets which have been delayed for too long.
On the other hand, the "background" level only gets transmitted if there is
nothing else to do; it is well-suited to low-priority traffic like bittorrent
or email from the boss. It would make sense to have a
relatively long queue for background packets, though, to be able to take
full advantage of a lull in higher-priority traffic.
Within these devices, each class of service has its own transmit queue.
This separation of traffic makes it easy for the hardware to choose which
packet to transmit next. It also allows independent limits on the size of
each queue; there is no point in filling the device's queue space with
background traffic which is not going to be transmitted in any case. But
the networking subsystem does not have any built-in support for multiqueue
devices. This hardware has been driven using a number of creative
techniques which have gotten the job done, but not in an optimal way. That
may be about to change, though, with the advent of David Miller's multiqueue transmit patch
series.
The current code treats a network device as the fundamental unit which is
managed by the outgoing packet scheduler. David's patches change that
idea somewhat, since each transmit queue will need to be scheduled
independently. So there is a new netdev_queue structure which
encapsulates all of the information about a single transmit queue, and
which is protected by its own lock. Multiqueue drivers then set up an
array of these structures. So the new data structure can, with sufficient
imagination, be seen to look something like this:
Once again, the actual lists of outgoing packets normally exist in the form
of special data structures in device-accessible memory. Once the device
has these queues set up for it, the various policies associated with each
class of service can be implemented. Each queue is managed independently,
so more voice packets can be queued even if some other queue (background,
say) is overflowing.
David would appear to have worked hard to avoid creating trouble for
network driver developers. Drivers for single-queue devices need not be
changed at all, and the addition of multiqueue support is relatively
straightforward. The first step is to replace the
alloc_etherdev() call with a call to:
struct net_device *alloc_etherdev_mq(int sizeof_priv,
unsigned int queue_count);
The new queue_count parameter describes the maximum number of
transmit queues that the device might support. The actual number in use
should be stored in the real_num_tx_queues field of the
net_device structure. Note that this value can only be changed
when the device is down.
A multiqueue driver will get packets destined for any queue via the usual
hard_start_xmit() function. To determine which queue to use, the
driver should call:
u16 skb_get_queue_mapping(struct sk_buff *skb);
The return value is an index into the array of transmit queues. One might
well wonder how the networking core decides which queue to use in the first
place. That is handled via a new net_device callback:
u16 (*select_queue)(struct net_device *dev, struct sk_buff *skb);
The patch set includes an implementation of select_queue() which
can be used with WME-capable devices.
About the only other required change is for multiqueue drivers to inform
the networking core about the status of specific queues. To that end,
there is a new set of functions:
struct netdev_queue *netdev_get_tx_queue(struct net_device *dev,
u16 index);
void netif_tx_start_queue(struct netdev_queue *dev_queue);
void netif_tx_wake_queue(struct netdev_queue *dev_queue);
void netif_tx_stop_queue(struct netdev_queue *dev_queue);
A call to netdev_get_tx_queue() will turn a queue index into the
struct netdev_queue pointer required by the other functions, which
can be used to stop and start the queue in the usual manner. Should the
driver need to operate on all of the queues at once, there is a set of
helper functions:
void netif_tx_start_all_queues(struct net_device *dev);
void netif_tx_wake_all_queues(struct net_device *dev);
void netif_tx_stop_all_queues(struct net_device *dev);
Naturally, there are a few other details to deal with, and the multiqueue
interface is likely to evolve somewhat over time. At one point, David was
hoping to have this feature ready for inclusion into 2.6.27, but that goal
looks overly ambitious now. It does seem that much of the ground work will be merged in the
next development cycle, though, meaning that full multiqueue support should
be in good shape for merging in 2.6.28.
Comments (9 posted)
By Jake Edge
July 9, 2008
A change very late in the development cycle for 2.6.26 provides a framework
for extending printk() to handle new kinds of arguments. Linus
Torvalds just merged the change—after -rc9—presumably
partially because he knew he could trust the author, but also because it
should have no
effect on the kernel. It will provide for better debugging output once
code is changed to take advantage of it.
The core idea is to extend printk() so that kernel data structures
can be formatted in kernel-specific ways. In order to get some
compile-time checking,
the %p format specifier has been overloaded.
For example, %pI might be used to indicate that the associated
pointer is to be formatted as a struct inode, which could print
the most interesting fields of that structure. GCC will be able to check
for the presence of a pointer argument, but because it does not understand
the I part, cannot enforce that it is a pointer of the right type.
Extending printk() in this manner allowed Torvalds—who
authored the patch—to
add two new
types to printk(): %pS for symbolic pointers and
%pF for symbolic function pointers. In both cases, the code uses
kallsyms to turn the pointer value into a symbol name. Instead of
a kernel developer having to read long address strings and then trying to
find them in the system map, the kernel will do that work for them.
The %pF specifier is for architectures like ppc and ia64 that use
function descriptors rather than pointers. For those architectures, a function
pointer points to a structure that contains the actual function address.
By using the %pF specifier, the proper dereferencing is done.
As an example of how the augmented printk() could be used,
Torvalds converted
printk_address(). The
CONFIG_KALLSYMS dependency and the kallsyms_lookup() were
removed, essentially leaving a one-line function:
printk(" [<%016lx>] %s%pS\n", address, reliable ? "": "? ", (void *) address);
If
kallsyms is not present, the new
printk() just reverts
to printing the address in hexadecimal, which allows the special case
handling to be done there.
The clear intent is to allow additional extensions to printk() to
support other kernel data structures. The change to
vsprintf(), which underlies printk(), actually allows for
any sequence of alphanumeric characters to appear after the %p.
The new pointer() helper function currently only implements the
two new specifiers, but others have been mentioned.
The mostly likely additions are for things like IPv4, IPv6, and MAC
addresses. Torvalds specifically mentions
using %p6N as a possibility for IPv6 addresses. Some would rather
have seen a different syntax be used, %p{feature} was suggested, but that would conflict with some
current uses of %p in the kernel. Torvalds is happy with his choice:
I _expressly_ chose '%p[alphanumeric]*' because it's basically
totally insane to have that in a *real* printk() string: the end result
would be totally unreadable.
The patch took an interesting route to the kernel, with much of the
discussion evidently going on in private between Torvalds, Andrew Morton,
and others before popping up on the linuxppc-dev and linux-ia64 mailing
lists. The patch itself has not been posted to linux-kernel in its
complete form, but was
committed on July 6. While it is a bit strange to see such a change this
late in the development cycle, it is a change that should have no impact as
there are no
plans to actually use the new specifiers in 2.6.26.
Comments (6 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Virtualization and containers
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>