The 2.6.27 kernel is out, released by Linus on October 9.
For those just tuning in, 2.6.27 includes (among many other
things) UBIFS, support for integrity checking in the block
layer, multiqueue networking,
the ftrace tracing framework,
the lockless page cache, the
relocation of a lot of
firmware, the GSPCA webcam driver set, and a number of extended system calls.
See the always-excellent
KernelNewbies summary for lots more information about this release.
The 2.6.28 merge window is currently open with around 4100 changesets
merged at the time of this writing. See the article below for a summary of
what has been added to the kernel so far in this development cycle.
The current stable kernels are 188.8.131.52 and
184.108.40.206 which were
October 8. Both contain a long list of important fixes throughout the
Comments (4 posted)
Kernel development news
ooh, I like err_ick and err_fck a lot. They sound like akpm review
comments at the end of a long day.
-- Andrew Morton
You should just get a real name, not that "John Smith"
crud. Something _manly_. Something unique. Something
strong. Something that tells you that you're not just another
Something like "Linus Torvalds". Except not exactly.
-- Linus Torvalds
(Thanks to Matthew Burgess)
On Tue, 14 Oct 2008, Jean Delvare wrote:
> Marek Vasut (1):
> i2c/tps65010: Vibrator hookup to gpiolib
Guys, I know we geeks aren't known for our sex-life, but do we have to
make it so obvious?
-- Linus Torvalds
(thanks to David Lang and
Comments (none posted)
2.6.16 has become a bit dated, and I'll maintain 2.6.27 for a few years
as a replacement.
As with 2.6.16, I'll pickup maintenance when the normal -stable
maintenance ends (at some point after 2.6.28 gets released in January).
It is intended to fill the niche for users who are not using
distribution kernels but want to use a regression-free kernel for a
longer time. It might be a small part of the userbase, but after the
experiences with 2.6.16 I can say that there are quite a few users
who appreciate such an offering.
-- Adrian Bunk
Comments (1 posted)
As of this writing, 4193 non-merge changesets have been incorporated for
the 2.6.28 kernel. In other words, this merge window is just beginning,
having merged probably less than half of the patches which will eventually
find their way into the mainline. What we see so far are a lot of drivers
and incremental improvements, but not many major changes.
User-visible changes for 2.6.28 include:
- There are new drivers for Analog Devices SSM2602, AD1882A and AD1980 codecs,
Freescale MPC5200 I2S audio devices,
Texas Instruments TLV320AIC26 codecs,
Tascam US-122L USB Audio/MIDI interfaces,
Wolfson Micro WM8580, WM8900, WM8903, and WM8971 audio devices,
Blackfin SPORT peripheral interface controllers,
NVIDIA HDMI HD-audio codecs,
Toshiba RBTX4939 MIPS boards,
Atheros L2 10/100 network adapters,
Cisco 10G Ethernet adapters,
JMicron JMC250 chipset-based network adapters,
QLogic QLGE 10Gb Ethernet adapters,
SMSC LAN95XX based USB 2.0 10/100 ethernet devices,
AFEB9260 ARM-based boards (an open source board design),
Arcom/Eurotech VIPER boards,
AT91SAM9X watchdog devices,
ITE IT8716, IT8718, IT8726, and IT8712 Super I/O watchdogs,
W83697UG/W83697UF watchdog devices,
Micron MT9M111 camera chips,
Magic-Pro DMB-TH tuners,
Afatech AF9015 and AF9013 DVB-T USB2.0 receivers,
Conexant cx24116/cx24118 tuners,
DVB cards based on SDMC DM1105 PCI chip,
Silicon Laboratories SI2109/2110 demodulators,
ST STB6000 DVBS Silicon tuners,
numerous Fujifilm FinePix cameras,
ALi video camera controllers,
WM8400 AudioPlus HiFi codecs, and
SGS-Thomson M48T35 Timekeeper RAM chips.
- Support for the old Sun 4 architecture and ColdFire serial ports has
- There is a new sysfs file (unload_heads) which can be
used by a user-space process to tell an ATA disk to retract its heads
and prepare for an impact. When used in conjunction with an
accelerometer, this feature could be used to attempt to preserve a
disk in a falling laptop.
- Improved support for ptrace() - and support for precise event-based
sampling in particular - has been added for the x86 architecture.
- The crypto subsystem has gained support for deterministic ANSI X9.31
A.2.4 pseudo-random number generation.
- The SMACK security module can now be configured to enforce mandatory
access control rules on privileged processes.
- There is a script which can be used to generate a minimal "dummy"
policy for SELinux. The smallest workable policy, it seems, is 587
- Some sound devices can detect the presence of audio devices on input
and output jacks. The ALSA layer now allows drivers for those devices
to register those jacks and report the presence of devices attached to
sound cards through the input layer.
- Work with multiqueue networking continues; 2.6.28 will include the
ability to associate a separate queueing discipline with each internal
- The wireless regulatory
compliance subsystem has been merged.
- The kernel now supports the Phonet packet protocol used by
Nokia cellular modems. See networking/phonet.txt in the kernel
documentation directory for more information.
- Also added to core networking is support for the Distributed Switch
Architecture protocol, with initial support for a number of Marvell
- The netfilter layer has been augmented to support network namespaces.
- The ext4 system has lost the "ext4dev" name; this is a signal that the
developers are getting ready to declare it ready for production use.
Ext4 has also gained a set of static tracepoints for use with
SystemTap or other tracing tools.
- The FIEMAP
ioctl() for extent mapping has been added.
- Xen has added CPU hotplugging support.
- Version 4 of the rpcbind protocol is now supported; this enables the
kernel to offer RPC services via IPv6.
- The OCFS2 filesystem has gained a number of features, including POSIX
locks, extended attributes, and use of the JBD2 journaling layer.
Changes visible to kernel developers include:
- Discard request
and request timeout handling have been added to the block layer; a
number of other internal API changes have been made as well. See this article for details.
- Video4Linux2 drivers no longer have their open() function
called with the big kernel lock held. The lock_kernel()
calls have been pushed down into individual drivers within the
mainline tree; external drivers will need to be fixed.
The merge window is likely to remain open until approximately
Comments (none posted)
The 2.6.28 merge window has seen the addition of a number of changes to the
block layer. Here's a summary of the new features and APIs which have gone
Solid-state storage devices
There are some enhancements aimed at improving the kernel's support
of solid state storage devices. One of those, the discard API, has been
covered here before. This API allows
high-level block subsystem
users (filesystems) to indicate that a particular range of blocks no longer
contains useful data. That allows the low-level device to incorporate
those blocks into its garbage collection scheme and to stop worrying about
their contents when performing wear leveling.
Since the initial LWN article, though, the API has changed a little. The
way to issue a discard request is now:
int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
The end_io() parameter seen in previous versions of the API is no
longer present. There is no way for callers to know when the request
completes, or, indeed, if the request completes at all. Since the caller
is indicating a lack of interest in the given sectors, it really should not
matter what the device does thereafter.
There is a filesystem-level function for creating discard requests:
static inline int sb_issue_discard(struct super_block *sb,
Here, the interface is expecting block numbers using the filesystem block
size, rather than 512-byte sectors.
User-space programs can issue discard requests with the new
BLKDISCARD ioctl() call. Needless to say, such
operations should be done with care; about the only logical user of this
ioctl() would be mkfs programs.
Block drivers which support discard requests will provide a suitable
function to the block layer:
typedef int (prepare_discard_fn) (struct request_queue *queue,
struct request *rq);
void blk_queue_set_discard(struct request_queue *q,
In the absence of a "prepare discard" function, discard requests for the
device will fail.
The block layer has also added a flag by which drivers can indicate that a
device is not rotating storage, and, thus, does not suffer from seek
delays. By setting QUEUE_FLAG_NONROT (with
queue_flag_set() or queue_flag_set_unlocked()), a driver
tells the block layer that it is working with a solid state device. I/O
schedulers can use that information to avoid plugging the queue - a useful
technique for combining requests to rotating storage devices, but a useless
operation when there is no seek penalty to avoid.
On large, multiprocessor systems, there can be a performance benefit to
ensuring that all processing of a block I/O request happens on the same
CPU. In particular, data associated with a given request is most likely to
be found in the cache of the CPU which originated that request, so it makes
sense to perform the request postprocessing on that same CPU. With 2.6.28,
sysfs entries for block devices will include an rq_affinity variable.
If it is set to a non-zero value, CPU affinity will be turned on for that
device. According to the patch changelog, turning this feature on can
reduce system time by 20-40% on some benchmarks.
Robust device drivers typically have to be written to handle cases where
devices fail to complete operations they have been instructed to do. In a
few cases, higher-level code helps with this task; the networking layer,
for example, can track outgoing packets and let a driver know when a
transmit operation has taken too long. In most other drivers, though, it's
up to the driver itself to notice when an operation seems to be taking too
Like the network subsystem, the block layer manages queues of requested
operations. As of 2.6.28 the block layer will, again like networking, have
a mechanism for notifying drivers about request timeouts; that, in turn,
will allow a bunch of timeout-related code to be removed from the lower
layers. Timeout handling in the block layer can be more complex, though,
and the associated API reflects that complexity.
A block driver must register a function to handle timed-out requests:
typedef enum blk_eh_timer_return (rq_timed_out_fn)(struct request *);
void blk_queue_rq_timed_out(struct request_queue *q,
The amount of time a request should be outstanding before timing out is set
void blk_queue_rq_timeout(struct request_queue *q,
unsigned int timeout);
The tracking of per-request timeouts is done within the block layer; the
timer for any individual request is started when that request is dispatched
to the driver by the I/O scheduler. Should a request fail to complete
before the timeout period passes, the driver's timeout function will be
called with a pointer to the languishing request. The driver then can do
one of three things:
- Figure out that, in fact, the request was completed as expected, but
that completion had not been noticed by the driver. A dropped
interrupt could bring out such a situation, for example. In this
case, the driver returns BLK_EH_HANDLED, and the request will
be marked as completed.
- Decide that the request needs more time, perhaps because it has been
re-issued by the driver. A BLK_EH_RESET_TIMER will start the
timer again for this request.
- Punt and return BLK_EH_NOT_HANDLED. The block layer
currently does nothing at all when it gets this return code; future plans
appear to include aborting the request within the block layer when
this return value is encountered.
If things look bad, the driver may decide to abort any outstanding
requests, reset the device, and start over. There are a couple of new
functions which can help with this task:
void blk_abort_request(struct request *req);
void blk_abort_queue(struct request_queue *q);
These functions will abort the given request, or all requests on the queue,
as appropriate. Part of that process involves calling the driver's timeout
handler for each aborted request.
Other changes in brief
Some other block-layer changes include:
- The handling of minor numbers has been changed, allowing disks
to have an essentially unbounded number of partitions. The cost of
this change is that minor numbers may be attached to a different major
number, and they might not all be contiguous; for this reason, drivers
must set the GENHD_FL_EXT_DEVT flag before the extended
numbers will be used. See this
article for more information on this change.
- The prototypes of blk_rq_map_user() and
blk_rq_map_user_iov() have changed; there is now a
gfp_mask parameter. This allows these functions to be used
in atomic context.
- kblockd_schedule_work() has an additional parameter
specifying the relevant request queue.
- The new function bio_kmalloc() behaves much like
bio_alloc(), but it does not use a mempool to guarantee
allocations and can thus fail.
It is, all told, one of the busier development cycles for the block layer
in recent times.
Comments (1 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jake Edge
Next page: Distributions>>