The 2.6.27 merge window remains open
, so there is no 2.6 development
kernel release as of this writing. Patches continue to flow into the
mainline repository; see the summary below for the highlights.
The 126.96.36.199 stable update is in the review process as of this
writing; it should be released sometime around July 24. The proposed
update contains 47 patches implementing a wide variety of fixes.
Comments (none posted)
Linus has sent out an announcement that the 2.6.27 merge window is halfway
done, and that he's taking a break for a few days. "In the last couple of days I _have_ merged 50+ trees, and while there's
been some 'heated discussion' about some of them (you know who you are ;),
I'm hoping that we're actually in reasonably good shape even though it's
in the middle of the merge window, and that people will test out the
snapshot kernels even though I'm not ready to do a -rc1 release.
Full Story (comments: 9)
Kernel development news
There is no more distributed storage you knew before, instead there
is completely new project being developed, which main goal is to
provide a transport layer for the block requests only. Consider it
as Network Block Device on huge steroids. Consider it as iSCSI on
huge steroids. Consider it as ATA-over-Ethernet on even more huge
steroids. It is just an example of what all those protocols should
have. And only that.
didn't get the "zero tolerance for doping" memo
If you want the kernel people to endorse your project, you'll have
to please them. Its that simple. If that means having to radically
re-structure your design, and/or break backwards compatibility then
so be it. Such are the costs for not collaborating from the start.
If you stubbornly refuse to co-operate you'll either break the
project or invite a fork/rewrite by someone else if the idea is
deemed worthwhile enough.
-- Peter Zijlstra
Being a good citizen in Linux land often means improving whole
subsystems rather than stuffing a bunch of fancy features into
individual drivers. Working that way can be harder, but it spreads
the benefits wider, and improves Linux as a whole.
FWIW, I would rather see implications thought about *and* mentioned
in the changelogs. OTOH, the above shows the real-world cases when
breakage hadn't even been realized to be security-significant.
Obviously broken behaviour (leak, for example) gets spotted and
fixed. Fix looks obviously sane, bug it deals with - obviously
real and worth fixing, so into a tree it goes... IOW, one _can't_
rely on having patches that close security holes marked as such.
For that the authors have to notice that themselves in the first
-- Al Viro
(read the whole thing)
Comments (5 posted)
Code cleanups sometimes expose fundamental disagreements about how the code
should look; here some veteran kernel hackers show how it's done.
Rusty, in his peevish way, complained that macros defining
constants should have a name which somewhat accurately reflects the
actual purpose of the constant.
Aside from the fact that PTE_MASK gives no clue as to what's actually
being masked, and is misleadingly similar to the functionally entirely
different PMD_MASK, PUD_MASK and PGD_MASK, I don't really see what the
-- Jeremy Fitzhardinge
Has Rusty ever heard about the economy of the healthy flow of
incoming regressions? What will we do without obscure names and
hard to find bugs? First he writes a simple and readable hypervisor
(ruining a whole industry based on obscurity!) and now that. It's
_so_ unamerican and unaustralian. I'm worried.
-- Ingo Molnar
I am disgusted with this inappropriate emphasis on clarity over
obscurity. It should be pretty clear to everyone here that we
can't have both! Fortunately, there is a way to partially rectify
the situation. Ingo, please apply.
+/* There's something suspicious about this line: see PTE_PFN_MASK comment. */
#define __PHYSICAL_MASK ((phys_addr_t)(1ULL << __PHYSICAL_MASK_SHIFT) - 1)
@@ -19,6 +20,7 @@
/* PTE_PFN_MASK extracts the PFN from a (pte|pmd|pud|pgd)val_t */
+/* This line is quite subtle. See __PHYSICAL_MASK comment above. */
#define PTE_PFN_MASK ((pteval_t)PHYSICAL_PAGE_MASK)
-- Rusty Russell
Comments (3 posted)
As of this writing, just over 6200 changesets have been merged into the
mainline git repository since the 2.6.26 release. Merge activity appears
to be slowing down somewhat; it appears that most of the major trees have
been pulled. Andrew Morton has not yet started to unload the -mm tree into
the mainline, though; until that happens, the merge window can be expected
to remain open.
User-visible changes merged since last week's summary include:
- There are new drivers for
Samsung S3C SD/MMC interfaces,
Atmel Multimedia card interfaces,
Ricoh Bay1Controller cards,
S/390 QDIO controllers,
Renesas SuperH SH7710 and SH7712 Ethernet controllers,
Option HSDPA/HSUPA mobile network devices,
Broadcom BCM57711 Ethernet adapters,
Mikrotik RouterBoard 532 series boards,
Anysee DVB-T/C USB2.0 receivers,
Sensoray 2255 video capture devices,
Siano SMS10xx digital television devices,
SuperH Mobile CEU camera controllers,
Niagara2 hardware random number generators,
HTC Shift (X9500) touchscreens,
iNexio serial touchscreens,
Sahara TouchIT-213 touchscreens,
Xilinx XPS PS/2 controllers,
Maxim MAX7301 GPIO expanders,
HP iLO/iLO2 management processors,
Atheros L1E Gigabit Ethernet adapters,
Marvell XOR DMA engines,
Synopsys DesignWare DMA controllers, and
Intel version 3.0 I/OAT DMA engines.
There is also a new PCI "slot detection driver" which will attempt to
find all PCI slots in the system and create corresponding entries in
- Worthy of note: the "gspca" set of video drivers, long maintained
outside of the mainline kernel tree, has been merged. These drivers
support a large number of video
devices; with their merge, most video camera devices on the market
are supported by Linux.
- The Fujitsu laptop driver has been updated with better hotkey and
backlight support for more Fujitsu models.
- The UBIFS filesystem for
flash-based storage devices has been merged.
- The multiqueue
networking patches have been merged.
- The IA-64 architecture has gained a paravirt_ops implementation to
- The new directories found at /sys/dev/char and
/sys/dev/block contain pointers to sysfs entries for devices
organized by device number.
Changes visible to kernel developers include:
- The new suspend and
hibernate infrastructure has been merged, providing a wider set of
callbacks for power management events. The PCI and platform bus
interfaces have been enhanced with support for this new
- The TTY layer continues to evolve; significant changes include the
introduction of a new tty_port structure meant to hold
information common to all TTY ports and a rework of the line
- The mac80211 code has a new module which can simulate any number of
IEEE 802.11 radios; it is suitable for testing mac80211 functionality
and associated user-space tools.
- There is a new "rfkill" mechanism for unified handling of "radio off"
switches on wireless devices.
- A number of Video4Linux2 format-related callbacks have been renamed to
make them match the names used with the associated buffer types.
In addition, the vidioc_enum_fmt_vbi_cap() callback has been
deprecated and marked for removal in 2.6.28.
- The videobuf layer now has support for controllers which cannot do
- The USB "gadget" framework has been massively reworked to provide
better support for composite devices.
- The prototype for device_create() has changed:
struct device *device_create(struct class *class,
struct device *parent,
const char *fmt, ...);
Those who see a resemblance to device_create_drvdata() are
right; all in-tree users were converted over to that interface,
the old device_create() was removed, and
device_create_drvdata() was renamed. For now, a macro makes
calls to device_create_drvdata() do the right thing, but that
macro will probably go away before the 2.6.27 final release.
- User-space UIO drivers can now write a signed value to the
/dev/uioX device to enable and disable interrupts.
- Debugfs (finally) has a function for removing an entire directory
void debugfs_remove_recursive(struct dentry *dentry);
As a result, code creating hierarchies in debugfs no longer need
remember the dentry of every file they create.
The tail end of the 2.6.27 merge window will be covered in next week's LWN
Comments (none posted)
Recent LWN articles on the linux-next tree have noted that, while this tree
has been working well in its role of identifying merge conflicts between
subsystem trees, it has not yet been through a full kernel development
cycle. 2.6.27 will be the first kernel release where linux-next was in
existence for the entire preceding cycle; in theory, everything which goes
into 2.6.27 should have been aged in linux-next first. As the end of the
2.6.27 merge window nears, a look at how linux-next has affected the
process seems warranted.
One might think that linux-next maintainer Stephen Rothwell would be able
to take a break during the merge window; it should mostly be a matter of
watching the linux-next tree drain into the mainline. As it happens, the
daily linux-next postings (example) suggest
a fair amount of scrambling to deal with merge conflicts, build failures,
and more. There are a number of reasons for this, one of which being that
subsystem trees are merged into the mainline in an order which is
completely unrelated to their order in linux-next. Patches which remain in
linux-next are being applied to a highly unstable base.
Another interesting phenomenon has been a fair number of patches appearing
in linux-next during the merge window. Some of these are actually patches
intended for 2.6.28; once maintainers have dumped their 2.6.27 patches into
the mainline, they are starting to acquire stuff for the next time around.
Stephen has asked them not to do that,
requesting that 2.6.28 material not be directed toward linux-next until
after the 2.6.27-rc1 release. The goal is that linux-next should be nearly
empty when 2.6.27-rc1 comes out.
Other patches, though, are intended for 2.6.27 but simply have not done
their time in the linux-next tree. That had led to a certain amount of
developer grumpiness at times. It is interesting to note, though, that one
of the biggest examples of linux-next avoidance - David Miller's merging of
the multiqueue networking code which he had finished writing hours before -
has generated relatively few complaints. But various other types of
conflicts have generated a steady steam of terse notes from Andrew Morton
(who is in the unfortunate position of basing his work on top of
linux-next) on how new stuff should have been in linux-next weeks ago.
Another area of, say, colorful conversation has been around the TTY
subsystem, currently been subjected to a much-needed thrashing by Alan Cox.
Some developers have been unhappy with Alan for merging code which failed
to compile, even though those problems had already been identified in
linux-next. Alan, instead, has become irritated with other developers who
have surprised him with TTY-layer changes of their own, causing Alan's
patches not to apply. Alan has some quaint notions about actually testing
his patches, so the resolution of this kind of conflict requires the
running of a new set of regression tests and such; after this had happened
a few times in a row, he started getting a little short-tempered. These issues
would appear to have been worked out at this point, but the idea behind
linux-next was to keep them from happening in the first place.
Yet another source of occasional merge issues is the rebasing of trees.
Rebasing, in git-speak, is the process of modifying the commit history in a
repository to cause a series of patches to look like they were written
against a later version of the code than they really were. Rebasing can be
a useful technique; it generates a series of patches which applies cleanly
to the current state of the tree without generating a bunch of unsightly
Rebasing can be especially useful in the context of linux-next. If testing
turns up a patch which breaks the build, simply committing a fix will leave
a period in the history where the kernel cannot be built, and that is bad
for people running bisections. With the use of git's history editing
features, the offending patch can be fixed in place and all evidence of the
mistake disappears. In essence, that embarrassing commit mentioning the
Eurasian campaign can be fixed up to properly note that we've always been
at war with Eastasia.
But rebasing a repository changes the history (by design), creating, in the
process, an entirely new set of commits. Those commits are new code, to
the point that any results from testing the older version may no longer
apply. The commits also have new names, so any other developer who was
using a version of the repository will be shaken off and unable to merge.
Issues related to rebasing have come up a couple of times during the merge
window, leading Linus to post a series of lectures on
the problems that rebasing can cause. It is clearly a tool which must be
used with restraint, but occasional use of rebasing can, in the linux-next
context, lead to a better final merge. Finding the right balance is
something each developer will have to learn.
In the end, the merge window remains a bit of an unruly time. The process
of channeling the work of several hundred developers into the mainline over
a two-week period is unlikely to ever be an entirely smooth experience.
But, for all its glitches, the 2.6.27 merge window has been (so far!)
easier than 2.6.26. The presence of the linux-next tree almost certainly
has something to do with that. This tree's role continues to evolve, but
its benefits are starting to be felt.
Comments (1 posted)
Three weeks ago, LWN looked at
the renewed interest in dynamic tracing
, with an emphasis on
SystemTap. Tracing is a perennial presence on end-user wishlists; it
remains a handy tool for companies like Sun Microsystems, which wish to
show that their offerings (Solaris, for example) are superior to Linux. It
is not surprising that there
is a lot of interest in tracing implementations for Linux; the main
surprise is that, after all this time, Linux still does not have a
top-quality answer to DTrace - though, arguably, Linux had a working tracing mechanism
before DTrace made its appearance.
Even a casual reader of the kernel mailing list will have noticed that
there are a lot of tracing-related patches in circulation at the moment.
There are so many, in fact, that it is hard to keep track of them all. So
this article will take a quick look at the code which has been posted in an
attempt to make the various options a bit clearer.
SystemTap remains the presumptive Linux tracing solution of choice.
It is hampered by a few problems, though, including usability issues, a
complete lack of static trace points in the mainline kernel, and no
user-space tracing capability. On the
usability side, we are seeing a few more kernel developers trying to put
SystemTap to work and posting about the problems they are having. If one
takes as a working hypothesis the notion that, if kernel hackers cannot
make SystemTap work, many other users are likely to encounter difficulties
as well, then one might conclude that addressing the reported problems
would be a priority for the SystemTap developers.
The SystemTap developers do seem to be interested in these reports, which
is a good sign. There are other things happening in the SystemTap arena,
including the release of
version 0.7 on July 15. This release adds a number of new
features and tapsets, and a substantial set of examples as well.
Meanwhile, Anup Shan has posted an interesting
integration of SystemTap and the fault injection framework, allowing
tapsets to control fault injection and trace the results.
James Bottomley has been playing some with the SystemTap code; one result
of that work is changes to
SystemTap's internal relocation code in an attempt to make it more
acceptable for mainline kernel inclusion. There can be no doubt that the
out-of-tree nature of much of the SystemTap support code has made it harder
for that code to progress, so any improvement which makes it more likely
that some of this code will be merged is welcome.
Also by James is this patch
implementing a new way to put markers into the kernel. The addition of
markers (or static tracepoints) has always been problematic in that many of
these markers, by their nature, need to go into some of the hottest code
paths in the kernel. To support dynamic tracing, these markers need to be
available on production systems, so they must work without creating any
significant performance regressions. Quite a bit of work has gone into the
static marker code which is in the kernel (but mostly unused) now, but some
developers are still uncomfortable with putting them into
James's patch addresses these concerns by putting the tracepoints entirely
outside of the code paths. Rather than add some sort of marker to the
code, these markers just make a note of just where in the code the marker
is supposed to be; this note is stored in a separate part of the kernel
binary. That information is enough for a run-time tool to patch in an
actual jump to a tracing function should somebody want to see the
information from that tracepoint. An additional benefit is that these
markers do not interfere with any optimizations done by the compiler. Other
solutions can insert optimization barriers which, while they do make life
easier for the tracing subsystem, also affect the speed of the code even
when the trace points are not active.
The text above said that the kernel's static tracepoint
code is "mostly unused." That would have been better expressed as
"completely," except that the 2.6.27 kernel will include a user in the form
of the ftrace framework. One of the things which makes ftrace truly unique
is that its documentation was not only merged before the code itself, but
well before: the 2.6.26 kernel includes the excellent Documentation/ftrace.txt file.
The ftrace (which stands for "function tracer") framework is one of the
many improvements to come out of the realtime effort. Unlike SystemTap, it
does not attempt to be a comprehensive, scriptable facility; ftrace is much
more oriented toward simplicity. There is a set of virtual files in a
debugfs directory which can be used to enable specific tracers and see the
results. The function tracer after which ftrace is named simply outputs
each function called in the kernel as it happens. Other tracers look at
wakeup latency, events enabling and disabling interrupts and preemption,
task switches, etc. As one might expect, the available information is
best suited for developers working on improving realtime response in
Linux. The ftrace framework makes it easy to add new tracers, though, so
chances are good that other types of events will be added as developers
think of things they would like to look at.
markers mechanism is meant to be the way that static tracepoints are
inserted into the kernel. To that end, a great deal of effort went into
making these markers fast; they are, for all practical purposes, a set of
no-op instructions until somebody wants to turn one on, at which point the
real tracing code is patched into the running kernel. Since they were
merged, however, kernel markers have been the subject of a few grumbles.
In particular, kernel markers use a somewhat awkward mechanism to ensure
that any arguments passed to the tracing function are interpreted correctly
there. Each marker has a printk()-style format string associated
with it; that string describes the type of each "argument" (a variable
or expression within the code being traced). When tracing code activates a
marker, it will supply a function to be called when the marker is hit and a
format string describing the arguments that the function expects. The
marker code will ensure that both format strings match; otherwise the
marker will not be enabled. The problem is that the format string requires
extra work to write and is only approximate in its specification of the
types involved. These strings can make it clear that a given argument is a
pointer, for example, but they say nothing about what type is pointed to.
In response to various efforts to get around this issue, Mathieu Desnoyers
(the original author of the kernel marker work) has proposed a new
mechanism called tracepoints. They are another
way of putting static trace points into the kernel, but with a simpler and
more type-safe way of putting the pieces together.
With tracepoints, every trace point must be declared in a header file with
a mildly ugly set of macros:
This definition will create a new tracepoint called
tracepoint_name. Any function attached to that tracepoint must
have a function prototype as provided in the TPPROTO() macro; the
names of the associated arguments are provided with TPARGS().
Perhaps this is better understood with an example. The tracepoints patch
set includes quite a few static points for use with the LTTng tracing
toolkit. There is one called sched_wakeup which fires whenever
the scheduler wakes up a process. It is defined with:
TPPROTO(struct rq *rq, struct task_struct *p),
The actual insertion of the tracepoint is a line like this:
Note the trace_ prefix added to the supplied name. At this point
in the code, a tracing function can be called with rq (the run
queue of interest) and p (the process which is waking up) as parameters.
Until an actual function is connected to the tracepoint, though, this
declaration is essentially a no-op. Connection of a trace function is done
through a call to:
void my_sched_wakeup_tracer(struct rq *rq, struct task_struct *p);
The register_trace_sched_wakeup() function (created as part of the
DEFINE_TRACE() definition) will connect the supplied trace
function to the tracepoint. The fact that the function prototype for the
trace function is supplied as part of the tracepoint definition means that
the compiler can perform thorough type checking; if the prototypes do not
match up, compilation will fail. And that, in turn, should put an end to
those embarrassing situations where turning on tracing causes the system to
go down in flames.
Interestingly, tracepoints have dispensed with much of the mechanism
developed to minimize the runtime impact of kernel markers; in particular,
they do not use the "immediate values" code. Profiling has shown that the
performance impact of tracepoints is so low that there is little value in
the added complexity of runtime patching of kernel code. Still, there are
signs that some kernel developers will object to the addition of
tracepoints in their current form. Developers want tracing support - but
not at the cost of slower performance, even if that cost is hard to
Finally, Roland McGrath recently surfaced with the tracehook patch set. Tracehook
has a rather different focus; it is, essentially, a cleanup of the way the
kernel handles the ptrace() system call. The tracehook patches
try to organize all of the process tracing code (much of which is
architecture-dependent) into one place where it can be dealt with as a
Tracehook is meant to be a first step toward the merging of a new version
of the utrace code. Utrace
has long been planned as the successor to the current ptrace()
implementation, which has few admirers. But utrace has encountered a
number of difficulties, so its path into the kernel has been slow. It
disappeared from the lists entirely for a while, but a new version of the
patches is said to be coming soon; Roland notes that he expects "some
vigorous feedback" when that happens.
The real importance of the ptrace() rework is that it is the path
toward integrated tracing of kernel- and user-space events. And that, of
course, is one of the biggest features offered by DTrace which is not yet
available in SystemTap. Getting user-space tracing into the kernel -
especially if it could work with the tracepoints already being inserted
into some applications for DTrace - would be a major step forward for
Linux. A lot of people will be watching when this patch set comes around
Meanwhile, Roland would like to see the tracehook code merged for 2.6.27.
He is late to the party, though, and this code has not done any time in
linux-next. So it is not yet clear whether tracehook will go in before the
merge window closes, or whether, instead, it will have to wait for 2.6.28.
As can be seen, there is a lot happening in the area of tracing support for
Linux. Tracing, it seems, is an idea whose time has come, at last. If the
pieces described here can be merged and integrated into a unified
framework, and if it can all be made sufficiently easy to use, the time for
"DTrace envy" will come to an end. Those "ifs" are not small ones,
though. There is quite a bit of work to be done yet; hopefully the current
level of energy will remain until the job is done.
Comments (14 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
- David Miller: : Sparc.
(July 21, 2008)
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>