The current 2.6 prepatch remains 2.6.12-rc5
. Linus's git repository
contains 200 or so patches; these are mostly fixes, but there is also a
conversion of the IDE driver code to the device model, a new Broadcom
bcm5706 gigabit driver, the removal of the Philips webcam decompression
code, an IPv4 "alias promotion" feature (make a secondary interface address
into the primary if the previous primary is deleted), and an updated CPU
The current -mm tree is 2.6.12-rc5-mm2.
Recent changes to -mm include the pluggable
congestion avoidance modules patch, some filesystem namespace patches,
some scheduler tweaks, and lots of fixes.
The current stable 2.6 kernel is 18.104.22.168, released on May 27.
The current 2.4 kernel is 2.4.31, released by Marcelo on May 31. 2.4.31
contains quite a few fixes and some driver updates, but new features are no
longer being added to 2.4.
Comments (none posted)
Kernel development news
Linus has just merged a patch from Alan Cox
removing some of the new decompression code from the Philips webcam
driver. "The original pwc author raised some questions about the reverse
engineering of the decompressor algorithms used in the pwc driver.
Having done some detailed investigation it appears those concerns that
clean room policy was not followed are reasonable.
" The hope, at
this point, is to merge an improved version of the driver in 2.6.13 which
will support (properly reverse-engineered) decompression modules in user
Comments (5 posted)
The first organized
, held in 2001, included a presentation on the NSA
Security-Enhanced Linux project. Linus's response at the time was that
there were several projects out there trying to find the best way to harden
Linux, and that he did not want to have to choose between them. Instead,
he asked for the creation of a generic framework which would allow an
arbitrary security module to be plugged into the system. The result, some
time later, was the Linux Security Module framework; LSM provides a long
list of hooks into kernel operations which allow a security module to veto
any action which violates the rules it is implementing.
The LSM patch ran into some difficulties on its way into the kernel, but it
is now an established part of the internal API.
So some developers were surprised recently when James Morris suggested that perhaps the time has come to
remove the LSM framework. His arguments
are simple: there is only one
serious module using the LSM framework in the intended manner, while
unrelated projects are trying to use it in inappropriate ways.
In the years since LSM was included in the mainline kernel, SELinux
has been the only significant module implemented and also included
in the mainline kernel. So we have a generalized framework for one
user, SELinux, which itself is a generalized framework....
It's dead code, an unnecessary abstraction layer between its one real user,
SELinux, and the core kernel.
James asks: rather than forcing SELinux to conform to a
general-purpose API (of which it is the sole user), why not just wire
SELinux directly into the kernel, get rid of LSM, and be done with it?
SELinux is not truly the only security module out there, of course. The
kernel includes a couple of other modules: a reimplementation of the
capabilities mechanism and "root plug," a module which prevents processes
from running as root unless a specific USB device is plugged in. There are
out-of-tree modules, such as the BSD
securelevels patch and Trustees Linux.
The Immunix (now Novell) AppArmor product includes a
module which uses the LSM framework. AppArmor is a proprietary offering,
but the security module portion of it is GPL-licensed (as is necessary,
since the functions for loading security modules are exported GPL-only).
There does not appear to be a groundswell of support for the idea of
removing the LSM framework from the kernel at this time. That could change
over time, however: increasingly, out-of-tree code is held to be irrelevant
when decisions are made. If SELinux remains the only significant in-tree
user of the LSM framework, LSM will look like useless baggage to more and
more developers. If there are security modules out there which are
reasonable alternatives to SELinux, their developers may want to think
about getting them into the mainline sometime in the not-too-distant
Comments (5 posted)
Every open file on a Linux system has an associated offset - the current
read or write position within that file. The virtual filesystem code, when
dealing with file positions, performs some basic checks, such as ensuring
that the position is not negative. After all, what sense does it make to
talk about a file position before the beginning of the file?
As it turns out, there is a situation where
a negative file position makes sense. Special files (such as
/dev/mem and /dev/kmem) provide a window into the
system's main memory. The "position" within these files corresponds to the
address of the memory of interest. The interesting thing is that, on the
x86_64 platform, addresses can be negative numbers.
This comes about as follows: this architecture currently uses a 48-bit
address space. The hardware sign-extends the uppermost bit, however, so
any address with that bit set will turn into a negative number. The x86_64
Linux port uses the upper bit to mark kernel space, so kernel addresses
are, in fact, negative. A quick look at /proc/kallsyms confirms
ffffffff80100000 T startup_32
ffffffff80100100 T startup_64
ffffffff801001a0 T initial_code
ffffffff801001a8 T init_rsp
ffffffff801001b0 T early_idt_handler
The end result is that using /dev/kmem on an x86_64 system is
difficult; any attempt to seek into kernel space will yield an error.
The clear fix is to modify the VFS layer to let negative file positions be
passed through to the underlying filesystem or device driver. The problem
with doing that in a general way, however, is that not all code
(especially in drivers) is prepared to deal with a negative offset.
Suddenly exposing that code to negative offsets could open up no end of
bugs and security problems. So the real solution, as worked out by Al Viro and Linus Torvalds, is
to add a new flag for the file structure called
FMODE_ANY_OFFSET. This flag can only be set within the kernel;
user space has no access to it. So the /dev/kmem driver will be
able to set the flag and work with the full range of offsets, but, for the
rest of the system, nothing will change.
Comments (10 posted)
Merging Ingo Molnar's realtime preemption work was never going to be a
quiet process. The noise has, in fact, begun long before Ingo has even
proposed his work for inclusion. Now might be a good time to catch up with
the debate as a way of seeing how the arguments might go in the
The realtime preemption patches attempt to provide a guaranteed maximum
response time for high-priority user-space processes - just like a "real"
realtime operating system would. This goal is achieved by making
everything in the kernel preemptible. No matter what the kernel is
doing on a given processor, if a higher-priority process becomes runnable,
it will be scheduled immediately. Many changes are required to make the
whole kernel preemptible; the core parts are:
- New locking primitives. The spinlocks used by the kernel can cause
any number of processors to stall while waiting for a lock to become
free. Code which holds a spinlock cannot be preempted, or a
deadlocked kernel could result. The realtime preemption patches
introduce a new mutual exclusion type (the rt_mutex) which does not
spin, and, thus, will not stall a processor. The spinlocks and
semaphores currently used in the kernel are all converted over to the
new rt_mutex type, and all code which runs with spinlocks held becomes
preemptible. The rt_mutex type also implements priority inheritance,
so that a low-priority process will not block a higher-priority
process (for long, at least) by losing the processor while holding an
- Threaded interrupt handlers. Interrupt handlers can create latencies
by monopolizing the processor for long periods of time. The realtime
preemption patch moves interrupt handling into kernel threads, which
contend for the processor with all other processes in the system. If
a certain realtime task is more important than interrupt handling, its
priority can be set accordingly.
- Various other mutual exclusion mechanisms, including read-copy-update,
per-CPU variables, and seqlocks, require that preemption be disabled.
All of these mechanisms are changed for the realtime preemption mode,
usually by making them look more like regular spinlocks.
The realtime preemption patch set (at version -RT-2.6.12-rc5-V0.7.47-10 as of this writing)
is clearly large and intrusive - it would be hard to make fundamental
changes like those listed above any other way. It should be noted that
Ingo has gone out of his way to minimize this intrusiveness, however: the
patch is written to minimize code changes, and the kernel functions as
always if realtime preemption is not selected at configuration time. The
merging of this patch set would not force the new preemption model on
According to Lee Revell, the realtime
preemption patches are already seeing some serious use:
All of the Linux audio oriented distributions are already shipping
-RT kernels, and most of the serious Linux audio users who use
general purpose distros are running it. That's a few thousand
people running it 24/7 for months, and it's been at least a month
since any of these users found a real bug in -RT.
Certainly the discussions that inevitably follow the release of a new
version of the patch set indicate that there is an active user community
out there. Some members of the community are starting to wonder why the
realtime preemption patches have not been merged, and when (if ever) that
might change. The biggest reason is that Ingo has not yet requested that
the patches be included - though many small pieces and fixes from the
realtime patch set have found their way into the mainline. If and when
Ingo does push for inclusion, however, there will be some opposition.
To some developers, the realtime patch seems like a set of questionable
and widespread changes aimed at the needs of a very small user community.
Changing spinlocks into mutexes and moving interrupt handlers into threads
are fundamental changes to how the kernel does things with the potential
for the creation of subtle bugs and performance problems. Reworking things
and adding complexity at that level is not a task that should be undertaken
without a strong need - and many developers do not see a sufficiently
There are some concerns about the performance impact of these changes.
Acquiring an uncontended spinlock is a very fast operation; the rt_mutex
type, with its wait queues and priority inheritance mechanisms, is bound to
be slower. There is some anecdotal
evidence that there is a performance hit to realtime preemption, but
little in the way of real benchmarking has been done. In any case, the
performance penalty should only affect users who have actually enabled the
realtime preemption mode.
Finally, not everybody is convinced that the realtime preemption approach
can solve the real problem: providing an ironclad guarantee that a realtime
process will be scheduled within a given maximum latency. Ingo believes
that this guarantee can be made by eliminating all code within the kernel
which can delay a reschedule; others feel that, to make a guarantee that
can truly be trusted, the entire kernel must be audited and verified. They
have a point: how strong a guarantee would you want before running realtime
Linux in your car's braking system?
Those who want true realtime guarantees, along with developers who simply
do not want to clutter the kernel with realtime mechanisms, argue that a
different approach should be taken. The most commonly suggested
alternative is RTAI-Fusion,
which works (at its core) by interposing a "nanokernel" between Linux and
the bare hardware. The nanokernel guarantees latency by taking the
lowest-level scheduling decisions out of the Linux kernel's hands; it is
kept small and easy to verify. Another project taking a similar approach
which is based on the L4 microkernel.
Since the realtime preemption patch is not being proposed for merging at
this time, no decisions are likely to result from the current, lengthy
discussion. If Ingo has his way, there may
never be one big decision; instead, pieces of the patch will be merged if
and when it makes sense.
So i'm afraid nothing radical will happen anywhere. Maybe we can
have one final flamewar-party in the end when the .config options
are about to be added, just for nostalgia, ok?
There may be some interesting realtime-related sessions at next month's
Kernel Summit in Ottawa, however. Meanwhile, should anybody wish to plow
through the entire thread on linux-kernel, here is the starting point.
Comments (9 posted)
Patches and updates
Core kernel code
- dmitry pervushin: SPI core.
(May 31, 2005)
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>