The current 2.6 prepatch is 2.6.22-rc6
by Linus on June 24.
"I'm happy to say that things seem to have calmed down after -rc5,
and that most of this really is just bugfixes and regression fixing in
" This kernel development cycle would appear to be
getting closer to its conclusion; the list of known
is getting short. As always, the long-format
has lots of details.
About 30 patches have been merged into the mainline git repository since
the 2.6.22-rc6 release; they are fixes, mostly in the architecture-specific
and USB code.
There have been no -mm releases over the last week, and no releases of any
stable kernel trees.
Comments (none posted)
Kernel development news
Quite frankly, I personally am considering removing
"checkpatch.pl". That thing is just a nazi dream. That hard-coded
80-character limit etc is just bad taste.
-- Linus Torvalds
The problem IMO is that we are seeing less and less patch review
but it needs to be more and more. Andrew is one of a handful of
people who are reviewing lots of patches. It shouldn't be his
wheelbarrow to have to push around all the time. So if a little
automation can help Andrew, that's a good thing. Until people
revolt, that is.
-- Randy Dunlap
Comments (none posted)
Tasklets are a deferred-execution method used within the kernel; they were
added in the 2.3 development series as a way for interrupt handlers to
schedule work to be done in the very near future. Essentially, a tasklet
is a function to be called (with a data pointer) in a software interrupt as
soon as the kernel is able to do so.
In practice, a tasklet which is scheduled will (probably) be executed when
the kernel either (1) finishes running an interrupt handler, or
(2) returns to user space. Since tasklets run in software interrupt
mode, they must be atomic - no sleeping, references to user space, etc. So
the work that can be done in tasklets is limited, but they are still
heavily used within the kernel.
There is another problem with tasklets: since they run as software
interrupts, they have a higher priority than any process on the system.
Tasklets can, thus, create unbounded latencies - something which the
low-latency developers have been long working to eliminate. Some efforts
have been made to mitigate this problem; if the kernel has a hard time
keeping up with software interrupts it will eventually dump them into the
ksoftirqd process and let them fight it out in the scheduler.
Specific tasklets which have been shown to create latency problems - the
RCU callback handler, for example - have been made to behave better. And
the realtime tree pushes all software interrupt handling into separate
processes which can be scheduled (and preempted) like anything else.
Recently, Steven Rostedt came up with a different approach: why not
just get rid of tasklets altogether? Since the development of tasklets,
the kernel has acquired other, more flexible ways of deferring work; in
particular, workqueues function much like tasklets, but without many of the
disadvantages of tasklets. Since workqueues use dedicated worker
processes, they can be preempted and do not present the same latency
problems as tasklets; as a
bonus, they provide a process context which allows work functions to sleep
if need be. Workqueues, argues Steven, are sufficiently capable that there
is no need for tasklets anymore.
So Steven's patch cleans up the interface in a few ways, and turns the RCU
tasklet into a separate software interrupt outside of the tasklet
mechanism. Then the tasklet code is torn out and replaced with a wrapper
interface which conceals a workqueue underneath. The end result is a
tasklet-free kernel without the need to rewrite all of the code which uses
There is little opposition to the idea of eliminating tasklets, though it
is clear that quite a bit of performance testing will be required before
such a change could go into the mainline kernel. But almost nobody likes
the wrapper interface; it is just the sort of compatibility glue that the
"no stable internal API" policy tries to avoid. So there is a lot of
pressure to dump the wrapper and simply convert all tasklet users directly
to workqueues. Needless to say, this is a rather larger job; it's not
surprising that somebody might be tempted to try to avoid it. In any case,
the current patch is good for testing; if the replacement of tasklets will
cause trouble, this patch should turn it up before anybody has gone to the
trouble of converting all the tasklet users.
Another question needs to be answered here, though: does the conversion of
tasklets to workqueues lead to a better interrupt handling path, or should
wider changes be considered? Rather than doing a context switch into a
workqueue process, the system might get better performance by simply
running the interrupt handler as a thread as well. As it happens, the
realtime tree has long done exactly that: all (OK, almost all) interrupt
handlers run in their own threads. The realtime developers have plans to
merge this work within the next few kernel cycles.
Under the current plans, threaded interrupt handlers would probably be a
configuration-time option. But if developers knew that interrupt
handlers would run in process context, they could simply do the necessary
processing in the handler and do away with deferred work mechanisms
altogether. This approach might not work in every driver - for some
devices, it might risk adding unacceptable interrupt response latency -
but, in many cases, it has the potential to simplify and streamline the
situation considerably. The code would not just be simpler - it might just
perform better as well.
Either way, the removal of tasklets would appear to be in the works. As a
step in that direction, Ingo Molnar is looking
for potential performance problems:
So how about the following, different approach: anyone who has a
tasklet in any performance-sensitive codepath, please yell
now. We'll also do a proactive search for such places. We can
convert those places to softirqs, or move them back into hardirq
context. Once this is done - and i doubt it will go beyond 1-2
places - we can just mass-convert the other 110 places to the lame
but compatible solution of doing them in a global thread context.
This is a fairly clear call to action for anybody who is concerned about
the possible performance impact of this change on any particular part of
the kernel. If you think some code needs faster deferred work response
than a workqueue-based mechanism can provide, now is not the time to defer
the work of responding to this request.
Comments (7 posted)
Long-time LWN readers will know that the Linux security module (LSM) API is
controversial at best. To many, it has failed in its purpose, which is
enabling the development of competing approaches to hardened Linux system;
the only significant in-tree security module remains SELinux. Meanwhile,
the LSM interface is easily abused; since it allows the insertion of hooks
into almost any system operation of interest, it can be used by other
modules to provide non-security functionality. The LSM symbols are mostly
exported GPL-only, but it is still possible for binary-only modules to
abuse the LSM operations - and, apparently, some have done so.
SELinux hacker James Morris has been pondering this issue recently; he has
also noticed that the in-tree security modules (SELinux and the small
module implementing capabilities) cannot be unloaded. So, he asked, why
implement a modular interface at all? He has posted a patch which turns LSM into a
static API with no exported symbols. With this patch applied, any needed
security "modules" must be built into the kernel; there is no longer any
way to add them at run time.
There have been a few complaints, but, from your editor's point of view, it
does not seem like anybody has come up with a compelling reason why it must
be possible to unload security modules. Instead, it has been pointed out
that maintaining a coherent security state in the presence of unloadable
modules is nearly impossible. So this patch would appear to have
reasonably good chances of being applied. The only question, perhaps, is
whether the developers feel the need to provide an extended warning period
for developers and users of out-of-tree security modules.
One such module is AppArmor - the GPL-licensed security mechanism
distributed by Novell. AppArmor has remained out of the tree for a long
time while its developers have tried to address the various comments which
have been posted over the years. A new AppArmor patch has been
posted; many things have been fixed, but one of the core points remains:
AppArmor still uses a pathname-based mechanism for its policy enforcement.
This approach sits poorly with developers - especially those in the SELinux
camp - who think that pathnames are an inherently insecure method. In
their view, the only truly secure way to control access to objects is to
put labels on the objects themselves.
It seemed that this dispute had been resolved at the 2006 kernel summit,
where it was determined that the use of pathnames was not enough to keep
AppArmor out of the kernel. That has not stopped people from complaining,
though, and those complaints redoubled when another pathname-based approach
(TOMOYO Linux) was posted recently. If AppArmor does get into the
mainline, it will have to be over the objections of developers who feel
that is providing false security to its users.
Andrew Morton appears to want to resolve this issue and get it off the
mailing lists; he sees two alternatives:
a) set aside the technical issues and grudgingly merge this stuff
as a service to Suse and to their users (both of which entities are
very important to us) and leave it all as an object lesson in
b) leave it out and require that Suse wear the permanent cost and
quality impact of maintaining it out-of-tree. It will still be an
object lesson in how-not-to-develop-kernel-features.
It seems that Andrew would rather not be in the position of delivering
object lessons on how not to develop kernel code by whatever means; he
concludes with this request:
Sigh. Please don't put us in this position again. Get stuff
upstream before shipping it to customers, OK? It ain't rocket
At the 2006 summit, Linus took a clear position that the use of pathnames
for security policies seemed reasonable to him. Given that, along with the
fact that AppArmor is being widely distributed, and it seems that, sooner
or later, this module should find a home in the mainline - even if it is no
longer in modular form.
Comments (36 posted)
The 2.6.22 development cycle is slowly heading toward its conclusion,
meaning that it should be safe to try to list the significant internal API
changes made this time around. They include:
- The mac80211 (formerly "Devicescape") wireless stack has been merged,
creating a whole new API for the creation of wireless drivers,
especially those requiring software MAC support.
- The eth_type_trans() function now sets the
skb->dev field, consistent with how similar functions for
other link types operate. As a result, many Ethernet drivers have
been changed to remove the (now) redundant assignment.
- The header fields in the sk_buff structure have been renamed
and are no longer unions. Networking code and drivers can now just
skb->network_header, and skb->skb_mac_header.
There are new functions for finding specific headers within packets:
tcp_hdr(), udp_hdr(), ipip_hdr(), and
- Also in the networking area: the packet scheduler has been reworked to
use ktime values rather than jiffies.
- The i2c layer has seen significant new changes meant to make i2c
drivers look more like drivers for other buses. There are, for
example, new probe() and remove() methods for
notifying devices when i2c peripherals come and go. Since i2c is not
a self-describing bus, the support code still needs help to know where
i2c devices might be; for many classes of device, this information can
be had from the system BIOS.
- The crypto API has a new set of functions for use with asynchronous
block ciphers. There is also a new cryptd kernel thread
which can run any synchronous cipher in an asynchronous mode.
- The subsystem structure has been removed from the Linux
device model; there never really was any need for it. Most code which
was expecting a struct subsystem argument has been changed to
use the relevant kset instead.
- There is a new version of the in-kernel rpcbind (portmapper) client
which supports versions 2-4 of the rpcbind protocol. The portmapper
API has changed as a result.
- Numerous changes to the paravirt_ops methods have been made.
Additionally, paravirt_ops is no longer a GPL-only export.
- There is a new memory function:
void *krealloc(const void *p, size_t new_size, gfp_t flags);
As one would expect, it changes the size of the allocated memory, moving it
if need be.
- The SLUB allocator has
been merged as an experimental (for now) alternative to the slab
code. The SLUB API generally matches slab, but the handling of
zero-length allocations has
- A new macro has been added to make the creation of slab caches easier:
struct kmem_cache KMEM_CACHE(struct-type, flags);
The result is the creation of a cache holding objects of the given
struct_type, named after that type, and with the additional
slab flags (if any).
- The SLAB_DEBUG_INITIAL flag has been removed, along with the
associated SLAB_CTOR_VERIFY flag passed to constructors. The
result is a set of changes which ripples through quite a few source
files. The unused SLAB_CTOR_ATOMIC flag is also gone.
- The SuperH architecture has working kgdb support again.
- The ia64 architecture has a new tool which will inject machine check
errors into a running system. Not recommended for production
- The deferrable timers
patch has been merged. There is also a new macro for initializing
workqueue entries (INIT_DELAYED_WORK_DEFERRABLE()) which
causes the job to be queued in a deferrable manner.
- The old SA_* interrupt flags have not been removed as
originally scheduled, but their use will now generate warnings at
- There is a new list_first_entry() macro which, surprisingly,
gets the first entry from a list.
- The atomic64_t and local_t types are now fully
supported on a wider set of architectures.
- Workqueues have been reworked again. There is a new
void cancel_work_sync(struct work_struct *work);
This function tries to cancel a single workqueue entry, be it on the
shared (keventd) or a private workqueue.
Meanwhile run_scheduled_work() has been removed.
The LWN 2.6 API changes page is an
ongoing list of API changes in the 2.6 development series.
Comments (none posted)
Patches and updates
Core kernel code
Filesystems and block I/O
- Nick Piggin: fsblock.
(June 24, 2007)
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>