Brief items
The current 2.6 prepatch is 2.6.20-rc1,
released by Linus just after LWN
came out on December 13. See
the short-form changelog for a
(long) list of patches merged for 2.6.20.
Several dozen patches
(a relatively small number) have been merged into the mainline git
repository since -rc1 came out.
The current -mm tree is 2.6.20-rc1-mm1. Recent changes
to -mm include a new version of the user-space device driver feature, an
idle notification facility for x86-64, the lumpy reclaim patch, and a new
version of the dynamic tick patch.
For older kernels: 2.6.18.6 was released on
December 18; it contains a fair number of fixes (including one which
is security-related).
Adrian Bunk has released 2.6.16.36 with several patches
and 2.6.16.37-rc1 with a few
dozen more.
Willy Tarreau has been busy, having released 2.4.33.5 (two security patches),
2.4.33.6 (one more), and 2.4.34-rc3 (perhaps the last
before the 2.4.34 final release).
Comments (2 posted)
Kernel development news
Whilst Red Hat's medical coverage fully covers "mental health"
issues, I'd really rather not proceed down this avenue. We can't
support *one* kernel properly. On what planet does it make sense to
throw more variants in the mix ?
-- Dave Jones
Comments (1 posted)
For a moment, it seemed like things could happen pretty quickly. Martin
Bligh
suggested that, rather
than trying to nickel-and-dime binary modules to death, it would be more
honest to just ban them outright. Andrew Morton
spoke out in favor of the idea
as long as a one-year warning was provided. Greg Kroah-Hartman
hacked up a patch to insert the warning. And
Linus, at the outset, restricted himself to
commenting on Greg's poetry.
The tide turned just as quickly, however. Linus spoke out against the change,
and Greg withdrew it. It
would appear that binary-only modules will continue to be loadable into the
kernel for the foreseeable future - though other hazards may await those who
distribute them.
The loading of proprietary modules was not banned for a few reasons, the
first of which being that there is, in fact, nothing wrong with doing so.
The GPL is quite clear in its statement that somebody who is in possession
of GPL-licensed code can use it in any way they wish. If they want to
combine their nice free kernel with a big, proprietary binary blob, they
are fully within their rights to do so. So banning proprietary modules in
the kernel source attacks the problem in the wrong place and attempts to
forbid an activity which is allowed by the license.
Even if the GPL could be interpreted as forbidding the loading of
binary-only modules, there is the fair use issue to consider. As a
community, we tend to be generally in favor of a broad interpretation of
fair-use rights. But fair use cuts both ways. A number of people in the
discussion warned against adopting the tactics favored by the entertainment
industry and taking an overly broad view of what the law allows copyright
owners to do. As Ben Collins put it:
The gradual changes to lock down kernel modules to a particular
license(s) tends to mirror the slow lock down of content
(music/movies) that people complain about so loudly. It's basically
becoming DRM for code.
The fact that some people were willing to discuss making use of the DMCA to
make sure that nobody could patch a proprietary module ban out of the code
tends to reinforce this view. Alan Cox noted that people tend to become that which
they fight. Most people in the community would probably agree that
the entertainment industry is not something we wish to become; this
realization has, arguably, done a lot to erode support for the idea of
banning proprietary modules.
What the GPL does cover is distribution; anybody who distributes something
derived from GPL-licensed code must do so under the terms of the GPL. So
it is the act of distributing proprietary modules which enters legally
questionable territory. But, as Linus points
out, the fact that a module can be loaded into the kernel does not
imply that the module is necessarily a derived work of the kernel. The
determination of derived work status is a complicated business, and can
often require a court to provide the definitive word. But banning all
proprietary modules on the idea that they are all illegal derived works is
a hard action to defend.
The end result is that there will be no technical measures for the blocking
of binary modules added to the kernel anytime soon. Unhappiness with these
modules remains, however, as can be seen in Greg's message withdrawing the
patch:
It's just that I'm so damn tired of this whole thing. I'm tired of
people thinking they have a right to violate my copyright all the
time. I'm tired of people and companies somehow treating our
license in ways that are blatantly wrong and feeling fine about it.
Because we are a loose band of a lot of individuals, and not a
company or legal entity, it seems to give companies the chutzpah to
feel that they can get away with violating our license.
It seems clear that the issue will not go away, even though this particular
approach to addressing it has been rejected. The course which appears to
be open to disgruntled kernel developers is legal action: if the
distribution of a specific binary module can be shown to be a copyright
violation, then the copyright owners have the right to go to court to put a
stop to it. GPL enforcement efforts have, so far, tended to be
successful. So it would not be surprising to see one or more developers
decide to bring a suit against a binary module distributor in the next year
or so. The discontent which is so visibly out there is unlikely to just
fade away.
Comments (14 posted)
When Linus
released 2.6.19,
he expressed a certain degree of confidence about its quality:
It's one of those rare "perfect" kernels. So if it doesn't happen
to compile with your config (or it does compile, but then does
unspeakable acts of perversion with your pet dachshund), you can
rest easy knowing that it's all your own d*mn fault, and you should
just fix your evil ways.
While this kernel may have lived up to expectations in a number of ways, it
would appear that somebody's evil ways have messed things up - and
dachshunds would be well advised to keep a low profile. It seems that
this kernel can corrupt ext3 filesystems - behavior which was not in the
original set of design goals.
The good news (for users) is that the bug is hard to trigger, and that most
access patterns work just fine. The bulk of the trouble seems to come with
a certain Bittorrent client, which has an unusual access pattern at best.
On occasion, parts of a page will end up being written as zeroes, through
to the end of the page. Please do not expect your editor to explain why
this is happening; it seems that nobody really understands that yet. The
solution, however, may involve some relatively serious low-level memory
management surgery.
The apparent origin of the problem is a change in how dirty pages are
tracked in the kernel. Prior to 2.6.19, this information lived in the page
tables; the 2.6.19 kernel, however, moves some of this information into the
page structure. This change enables better tracking of dirty
pages in the system, which is a good thing, but it could also be bringing
some old bugs out to play.
Not all of those bugs are necessarily in the kernel; at one point, Linus
went off and wrote a demonstration program
showing how a buggy program would work with older kernels but get
surprising results in 2.6.19. What it comes down to is that if a program
maps a file into memory, it cannot put data into that memory beyond the
current length of the file and expect that data to make it to disk. It was
a nice demonstration, but this behavioral change does not appear to be
behind the problem reports.
Confusion surrounding the propagation and management of the page dirty bits
is at the top of the suspect list, as of this writing. Nobody seems to be
able to point at anything specific, however, beyond the fact that the code
appears to be rather badly messed up. Says
Linus:
A lot of this is actually historical cruft. Some of it may even be
code that was never supposed to work, but because we maintained
_other_ dirty bits in the PTE's, and never touched them before, we
never even realized that the code that played with PG_dirty was
totally insane.
So the approach being taken by Linus is to
rework the dirty page accounting code into something a little more
reasonable. To that end, test_clear_page_dirty() is no more,
having been pronounced "insane" by Linus. Instead, the new code tries for
a better defined sense of when the dirty bit on a page can be cleared; it
comes down to either (1) the page is being written to backing store,
or (2) the page is no longer relevant (when a file is truncated, for
example). In typical fashion, Linus fixed enough to make his own
configuration work, leaving the rest as an exercise for the reader.
He makes no claims that this rework will have solved the problem, only that
it makes the code more sane than it was before. As of this writing, there
have been no responses from the people who are able to reproduce this
problem. If the problem goes away - and the developers can convince
themselves that it has not just been papered over - then some version of
this fix will likely need to be prepared for a 2.6.19 update. Then, maybe,
the dachshunds can come out of hiding.
Comments (15 posted)
NAPI ("new API," though it is not so new anymore) is an interrupt
mitigation mechanism used with network devices. When network traffic is
heavy, the kernel can safely predict that incoming packets will be
available anytime it gets around to looking, so there is no need to have
the adapter interrupting it (possibly thousands of times per second) to
tell it about those packets. So a NAPI-compliant driver will turn off the
packet receive interrupt and provide a
poll() method to the
kernel. When the kernel is ready to deal with more packets,
poll() will be called with a maximum number
of packets it is allowed to feed into the kernel; it should process up to
that many packets and quit.
With NAPI in place, the kernel can process significantly higher packet
loads. The reduction in interrupt load helps, but there are a couple of
other advantages as well. The way NAPI works makes it less likely that
packets will be reordered in the kernel. And if traffic reaches the point
where the kernel is forced to drop packets, those packets can be dumped
before they are ever fed into the network stack. For more information on
NAPI, see this old LWN article
or this page at
OSDL, which is newer and more complete.
That page may require some updating soon, however, as Stephen Hemminger has
proposed a newer NAPI
(NNAPI?) which changes the driver API somewhat. In the current mainline,
there are two NAPI-related fields in the net_device structure:
poll(), being the function called to collect packets from the
adapter, and weight, which is essentially the driver writer's best
guess as to how important the interface is relative to any others which
might be on the system. Stephen's patch moves these parameters into a
separate structure (struct napi_struct), aggregating them with a
few other NAPI-related structures.
The napi_struct structure is then put back into struct
net_device, but drivers need not use that one. The whole purpose of
this patch would appear to be to separate the NAPI-related information from
specific network devices. There are some adapters which provide multiple
ports, all of which have a single receive interrupt. The separated NAPI
information allows all of those ports to share a single NAPI state and a
single poll() function; this organization better fits the reality
of the hardware.
This patch won't hit mainline before 2.6.21, so authors have some time to
react. The changes are relatively simple to make. The first is to find a
napi_struct structure for the device; in the absence of a reason
to do otherwise, the best solution would be to use the new napi
field in the net_device structure. So, if the current code
initializes itself with something like:
dev->weight = MY_WEIGHT;
dev->poll = my_poll;
The new version would look like this:
dev->napi.weight = MY_WEIGHT;
dev->napi.poll = my_poll;
The prototype of the poll() function has changed a bit, however;
it now looks like:
int (*poll)(struct napi_struct *napi, int budget);
The pointer to the net_device structure has been replaced with a
pointer to the napi_struct structure. In most cases, the
net_device pointer can be had with a call like:
struct net_device *dev = container_of(napi, struct net_device, napi);
The meaning of the budget parameter has changed slightly as well;
it is now the only indicator of how many packets the poll()
function may feed into the kernel. There is no longer any need to check
the quota field separately. Finally, the return value should be
the number of packets which were actually processed.
The other NAPI-related functions in the network system have been modified
in fairly predictable ways. NAPI polling is started with either of:
void napi_schedule(struct napi_struct *napi);
/* or */
int napi_schedule_prep(struct napi_struct *napi);
void __napi_schedule(struct napi_struct *napi);
Polling is turned off with:
void napi_complete(struct napi_struct *napi);
The current patch is in an early state, so the interfaces could change over
the next few months. Nobody has spoken out against it, though, so chances
are good that it will be merged in some form.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Networking
Architecture-specific
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>