The current stable 2.6 kernel is 2.6.24
by Linus on
January 24. Highlights of this release include control groups (formerly process containers
), the i386/x86_64 architecture merger
in the CFS
scheduler, network and PID
, the removal of the
modular security interface
, and much more. See LWN's list of merged patches
more detail, or the always-amazing KernelNewbies Linux Changes
for much more detail.
The 2.6.25 merge window is open, but the process of picking up patches is
going relatively slowly due to the distractions of linux.conf.au. See the
article below for a summary of what has been merged to date.
For older kernels: 18.104.22.168 was released on
January 27 with about a dozen fixes.
Comments (none posted)
Kernel development news
I skipped a lot of these patches because I just got bored of fixing
rejects. Now is a very optimistic time to be raising patches against
I'm going to work on getting a unified devel tree operating: one which
contains everyone's latest stuff and is updated daily. Basically it'll be
-mm without a couple of the quilt trees. People can then prepare patches
against that, as it seems that most can't be bothered patching against -mm,
let alone building and testing it. More later.
-- Andrew Morton
Even Anton Blanchard's phone calls have a signed-off-by line.
Comments (none posted)
As of this writing, some 3800 patches have been merged into the mainline
git repository since the release of 2.6.24. That is fewer than one might
have expected, but Linus's travel to linux.conf.au is slowing the process
somewhat. Expect more than the usual amount of interesting stuff to be
merged relatively late in the merge window period.
User-visible changes include:
- New drivers have been added for Globe Trotter HSDPA wireless cards,
HIFN 795x crypto accelerator chips, Xceive xc2028 and xc5000 tuners,
Cirrus Logic CS5345 analog-to-digital converters, several Beholder TV
tuners, Syntek DC1125 cameras, Silicon Labs Si470x FM radio receivers,
Atmel AT91CAP9 processors, Qualcomm MSM7X00A processors, Marvell Orion
system-on-a-chip devices, Marvell Feroceon processors, SuperH 7203 and
7263 processors, SGI IP28 systems, R6040 Ethernet adapters, Broadcom
NetXtremeII 10Gb network adapters, RTL8180 and 8185-based wireless
network cards, Microchip EN28J60 Ethernet chips, and, finally, Atheros-based
wireless network adapters.
- The Seagate ST-02/Future Domain TMC-8xx and PSI240i SCSI drivers have
been removed due to lack of interest and maintenance.
- Salsa20 stream cipher support has been added to the crypto layer (at
least for the x86 architecture - it's an assembly implementation).
- Some realtime work has gone into the scheduler; in particular, the
kernel will be more aggressive about moving tasks between processors
when multiple realtime tasks are contending for the same CPU. The
implementation of cpusets has been made to work more with the
scheduler domains mechanism. The option to make the big kernel lock
preemptible has been made the default; eventually the non-preemptible
version will go away altogether. High-resolution timers can be used
for preemption, making fair scheduling more accurate. The group
scheduling feature has been enhanced with realtime support.
- The Preemptible
read-copy-update patches have been merged.
- Support for the LatencyTop
utility has been merged.
- Kprobes support for the ARM architecture has been added.
- The new CLONE_IO flag to clone() causes I/O contexts
(used in the CFQ block I/O scheduler) to be shared with the new child
- The idle class for I/O scheduling has been changed to not be 100%
idle when the device is busy; as a result, it is far less likely to
cause priority inversion problems and is no longer limited to
- A long list of new ext4
features, including large file support, (very) large filesystem
support, journal checksumming, multi-block allocation, and more, has
been added in.
- The splice() system call now supports TCP receive streams.
- Controller area network
protocol support has been merged.
- The network traffic shaper, long obsolete and scheduled for removal,
- Quite a bit of work has been done on the network namespace code which
was first merged in 2.6.24. Extending namespace awareness through the
entire networking subsystem is a big job which is, at this point,
Changes visible to kernel developers include:
- Chinese translations of a number of core kernel development
documents have been added to the tree.
- There have been a great many changes to the low-level device model
APIs dealing with kobjects and ksets. These changes have, in turn,
forced a large number of adjustments throughout the tree. See
Documentation/kobject.txt for an
overview of the new API.
- There is a new set of security module functions for dealing with
filesystem mount and unmount operations.
- The chained scatterlist API has been augmented with the sg_table patches.
- There have been some changes to the block request completion API. See
this article for a
description of the new way of doing things.
As of this writing, the merging process has just begun, so expect a long
list again next week. Among other things, the x86 tree update, with 908
changesets, is waiting on the wings. There is quite a bit of code yet to
be merged for this development cycle.
Comments (2 posted)
Having applications that use up all the available memory can be a fairly
painful experience. For Linux systems, it generally means a visit from
the out-of-memory (OOM) killer, which will try to find processes to kill.
As one would guess, coming up with rules governing which process to kill is
challenging—someone, somewhere, will always be unhappy with
a choice the OOM killer makes. Avoiding it altogether is the goal
of the mem_notify patch.
When memory gets tight, it is quite possible that applications have memory
allocated—often caches for better performance—that they
could free. After all, it is generally better to lose some performance
than to face the consequences of being chosen by the OOM killer. But,
currently, there is no way for a process to know that the kernel is feeling
memory pressure. The patch provides a way for interested
programs to monitor the /dev/mem_notify file to be notified if
memory starts to run low.
/dev/mem_notify is a character device that signals memory
pressure by becoming readable. Interested programs can open the file and
then use poll() or select() to monitor the file
descriptor. Alternatively, signal-driven I/O can be enabled via the
FASYNC flag and the system will deliver a SIGIO signal to the
process when the device becomes readable. If it becomes readable, the
process should free any memory that it can afford to give up. If enough
memory is freed this way, the kernel will have no need to call in the OOM
The crux of the patch is how to decide that memory pressure is occurring.
mem_notify modifies shrink_active_list() to look for movement of
an anonymous page to the inactive list, which is an indication that some
will likely be swapped out soon. When that occurs,
memory_pressure_notify() (with the pressure flag set to 1) will be called for that zone. When the
number of free pages for the zone increase above a threshold—based
on pages_high and lowmem_reserve for the
zone—memory_pressure_notify() is called again, but with the
pressure flag set to 0, effectively ending the memory pressure event for
If there are numerous processes waiting for a memory pressure notification,
it could be counterproductive to wake them all at once—the "thundering
herd" problem. To combat this, the patch set adds the ability to wake
fewer processes than are waiting on the poll event, by adding the
poll_wait_exclusive() function. poll_wait_exclusive()
will in turn call add_wait_queue_exclusive() so that a
member of the wake_up() family can be used that will limit the number of processes
woken up. Previously, only poll_wait() was available, it uses
add_wait_queue(), which does not provide this ability.
Also, to reduce the frequency of processes waking up to reclaim memory,
memory_pressure_notify() will only do that once every five seconds.
The /proc/zoneinfo output has been changed to include the
mem_notify status. This can be used by a human for diagnostic purposes or by a program to
check the current status of zones for memory pressure.
The embedded community has a lot of interest in seeing this feature get
added to the kernel. Devices like phones and PDAs are often running close
to their memory limits and the OOM killer is currently unavoidable when the
user opens yet another application. With this patch in place, programs
that use a lot of memory, but could get by with less, can be changed to
free up their caches and the like when memory gets tight. As memory hungry
programs get changed, other users will
benefit as well.
The patch, submitted by Kosaki Motohiro, has been through several
iterations on linux-kernel. The work was originally started by Marcelo
Tosatti, with the fifth version recently posted by Kosaki. Previous
versions have been well received and with relatively few
comments on this iteration, it would seem to be getting close to being merged.
Comments (41 posted)
The 2.6 block layer has traditionally provided a pair of functions by which
a driver could indicate that an I/O request had been completed. A call to
signaled the transfer of a certain
amount of data and would return a value indicating whether the request as a
whole was complete. Once all sectors in a request had been transferred, it
was up to the driver to pass the request to
for final cleanup. There was also a
function called simply end_request()
which might or might not end
the entire request, depending on how much data had been transferred. This
API has worked for a long time, but it has occasionally proved confusing
for driver developers. It was also hard for drivers to communicate useful
error information with this interface.
So, as of 2.6.25, there will be a new way for
drivers to indicate request completion.
After a block driver has transferred one or more sectors (or failed in the
attempt), it should now make a call to:
int blk_end_request(struct request *rq, int error, int nr_bytes);
Where rq is the I/O request, error is zero or a negative
error code, and nr_bytes is the number of bytes successfully
transferred. If blk_end_request() returns zero, the request is
fully processed and the driver can forget about it. Otherwise there are
still sectors to be transferred and the driver should continue with the
blk_end_request() must acquire the queue lock to do its job. If
the driver already holds that lock, it should call
Block drivers traditionally did a number of housekeeping tasks between
calls to end_that_request_first() and
end_that_request_last(). These include calling
add_disk_randomness() to contribute to the entropy pool, returning
any tags used with the request, and removing the request from the queue.
All of that stuff is now done within blk_end_request(), so drivers
can forget about it. The occasional driver had to carry out other tasks
between the completion of the request and its removal from the queue. For
drivers with this kind of special need, there is a separate function to
int blk_end_request_callback(struct request *rq,
int (drv_callback)(struct request *));
In this version, drv_callback() will be called (without the queue
lock held) between the completion of the request and its final cleanup. If
the callback returns a non-zero value, that final cleanup will not be
done. This function will always acquire the queue lock - there is no
version for drivers which have already taken that lock. In general,
though, the use of the callback functionality is likely to be a sign that
the driver is being tricker than it really needs to be.
This change was accompanied by a fair number of patches converting all
in-tree drivers to the new interface. The old completion functions have
been removed, so out-of-tree drivers will need updating before they will
work with 2.6.25.
Comments (none posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Benchmarks and bugs
Page editor: Jake Edge
Next page: Distributions>>