The current development kernel is 3.6-rc3
on August 22. Linus says:
"Shortlog appended, there's nothing here that makes me go 'OMG!
Scary!' or makes me want to particularly mention it separately. All just
random updates and fixes.
Previously, 3.6-rc2 was released on on
August 16. "Anyway, with all that said, things don't seem too
bad. Yes, I ignored a few pull requests, but I have to say that there
weren't all that many of those, and the rest looked pretty calm. Sure,
there's 330+ commits in there, but considering that it's been two weeks,
that's about expected (or even a bit low) for early -rc's. Yes, 3.5 may
have been much less for -rc2, but that was unusual."
and 3.2.28 were both released on
Comments (1 posted)
Our power consumption is worse than under other operating systems
is almost entirely because only one of our three GPU drivers
implements any kind of useful power management.
— Matthew Garrett
Moving 'policy' into user-space has been an utter failure, mostly
because there's not a single project/subsystem responsible for
getting a good result to users. This is why I resist "policy should
not be in the kernel" meme here.
— Ingo Molnar
"inline" is now a vague, pathetic and useless thing. The problem
is that the reader just doesn't *know* whether or not the writer
really wanted it to be inlined.
If we have carefully made a decision to inline a function, we
should (now) use __always_inline. If we have carefully made a
decision to not inline a function, we should use noinline. If we
don't care, we should omit all such markings.
This leaves no place for "inline"?
— Andrew Morton
Copy and paste is the #1 cause for subtle bugs.
— Thomas Gleixner
Comments (23 posted)
Greg Kroah-Hartman has announced that the 3.4 kernel will receive stable
updates for a period of at least two years. It joins 3.0 (which has at
least one more year of support) on the long-term support list.
Full Story (comments: 3)
Kernel development news
Years of work to improve power utilization in Linux have made one thing
clear: efficient power behavior must be implemented throughout the system.
That certainly includes the CPU scheduler, but the kernel's scheduler
currently has little in the way of logic aimed at minimizing power use. A
recent proposal has started a discussion on how the
scheduler might be made to be more power-aware. But, as this discussion
shows, there is no single, straightforward answer to the question of how
power-aware scheduling should be done.
Interestingly, the scheduler did have power-aware logic from 2.6.18
through 3.4. There was a sysctl knob (sched_mc_power_savings)
that would cause the scheduler to try to group runnable processes onto the
smallest possible number of cores, allowing others to go idle. That code
was removed in 3.5 because it never worked very well and nobody was putting
any effort into improving it. The result was the removal of some rather
unloved code, but it also left the scheduler with no power awareness at
all. Given the level of interest in power savings in almost every
environment, having a power-unaware scheduler seems less than optimal; it
was only a matter of time until somebody tried to put together a better
Alex Shi started off the conversation with a
rough proposal on how power awareness might be added back to the
scheduler. This proposal envisions two modes, called "power" and
"performance," that would be used by the scheduler to guide its decisions.
Some of the first debate centered around how that policy would be chosen,
with some developers suggesting that "performance" could be used while on
AC power and "power" when on battery power. But that policy entirely
ignores an important constituency: data centers. Operators of data
centers are becoming increasingly concerned about power usage and its
associated costs; many of them are likely to want to run in a lower-power
mode regardless of where the power is coming from. The obvious conclusion
is that the kernel needs to provide a mechanism by which the mode can be
chosen; the policy can then be decided by the system administrator.
The harder question is: what would that policy decision actually do? The
old power code tried to cause some cores, at least, to go completely idle
so that they could go into a sleep state.
The proposal from Alex takes a different approach. Alex claims
that trying to idle a subset of the CPUs in the system is not going to save
much power; instead, it is best to spread the runnable processes across the
system as widely as possible and try to get to a point where all
CPUs can go idle. That seems to be the best approach, on x86-class
processors, anyway. On that architecture, no processor can go into a deep
sleep state unless they all go into that state; having even a single
processor running will keep the others in a less efficient sleep state. A
single processor also keeps associated hardware — the memory controller,
for example — in a powered-up state. The first CPU is by far the most
expensive one; bringing in additional CPUs has a much lower incremental cost.
So the general rule seems to be: keep all of the processors busy as long as
there is work to be done. This approach should lead to the quickest
processing and best cache utilization; it also gives the best power
utilization. In other words, the best policy for power savings looks a
lot like the best policy for performance. That conclusion came as a
surprise to some, but it makes some sense; as Arjan van de Ven put it:
So in reality, the very first thing that helps power, is to run
software efficiently. Anything else is completely secondary. If
placement policy leads to a placement that's different from the
most efficient placement, you're already burning extra power...
So why bother with multiple scheduling modes in the first place? Naturally
enough, there are some complications that enter this picture and make it a
little bit less neat. The first of these is that spreading load across
processors only helps if the new processors are actually put to work for a
substantial period of time, for values of "substantial" around 100μs.
For any shorter period, the cost of bringing the CPU out of even a shallow
sleep exceeds the savings gained from running a process there. So extra
CPUs should not be brought into play for short-lived tasks. Properly
implementing that policy is likely to require that the kernel gain a better
understanding of the behavior of the processes running in any given
There is also still scope for some differences of behavior between the two
modes. In a performance-oriented mode, the scheduler might balance tasks
more aggressively, trying to keep the load the same on all processors. In
a power-savings mode, processes might stay a bit more tightly packed onto a
smaller number of CPUs, especially processes that have an observed history
of running for very short periods of time.
But the conversation has, arguably, only barely touched on the biggest
complication of all. There was a lot of talk of what the optimal behavior
is for current-generation x86 processors, but that is far from the only
environment in which Linux runs. ARM processors have a complex set of
facilities for power management, allowing much finer control over which
parts of the system have power and clocks at any given time. The ARM world
is also pushing the boundaries with asymmetric architectures like big.LITTLE; figuring out the optimal task
placement for systems with more than one type of CPU is not going to be an
The problem is thus architecture-specific; optimal behavior on one
architecture may yield poor results on another. But the eventual solution
needs to work on all
of the important architectures supported by Linux. And, preferably, it
should be easily modifiable to work on future versions of those
architectures, since the way to get the best power utilization is likely to
change over time. That suggests that the mechanism currently used to
describe architecture-specific details to the scheduler (scheduling domains) needs to grow the ability
to describe parameters relevant to power management as well. An
architecture-independent scheduler could then use those parameters to guide
its behavior. That scheduler will also need a better understanding of
process behavior; the almost-ready
per-entity load tracking patch set may help
in this regard.
Designing and implementing these changes is clearly not going to be a
short-term job. It will require a fair amount of cooperation between the
core scheduler developers and those working on specific architectures.
But, given how long we have been without power management support in
the scheduler, and given that the bulk of the real power savings are to be
had elsewhere (in drivers and in user space, for example), we can wait a
little longer while a proper scheduler solution is
Comments (3 posted)
The kernel tends to place an upper limit on how quickly any given workload
can run, so it is unsurprising that kernel developers are always on the
lookout for ways to make the system go faster. Significant amounts of work
can be put into optimizations that, on the surface, seem small. So when
the opportunity comes to make the kernel go faster without the need to
rewrite any performance-critical code paths, there will naturally be a fair
amount of interest. Whether the "link-time optimization" (LTO) feature
supported by recent versions of GCC is such an opportunity or not is yet to
be proved, but Andi Kleen is determined to find out.
The idea behind LTO is to examine the entire program after the individual
files have been compiled and exploit any additional optimization
opportunities that appear. The most significant of those opportunities
appears to be the inlining of small functions across object files. The
compiler can also be more aggressive about detecting and eliminating unused
code and data. Under the hood, LTO works by dumping the compiler's
intermediate representation (the "GIMPLE" code) into the resulting object
file whenever a source file is compiled. The actual LTO stage is then
carried out by loading all of the GIMPLE code into a single in-core image
and rewriting the (presumably) further-optimized object code.
The LTO feature first appeared in GCC 4.5, but it has only really started
to become useful in the 4.7 release. It still has a number of limitations;
one of those is that all of the object files involved must be compiled with
the same set of command-line options. That limitation turns out to be a
problem with the kernel, as will be seen below.
Andi's LTO patch set weighs in at 74
changesets — not a small or unintrusive change. But it turns out that most
of the changes have the same basic scope: ensuring that the compiler knows
that specific symbols are needed even if they appear to be unused; that
prevents the LTO stage from optimizing them away. For example, symbols
exported to modules may not have any callers in the core kernel itself, but
they need to be preserved for modules that may be loaded later.
To that end, Andi's
first patch defines a new attribute (__visible) used to mark such
symbols; most of the remaining patches are dedicated to the addition of
__visible attributes where they are needed.
Beyond that, there is a small set of fixes for specific problems
encountered when building kernels with LTO. It seems that functions with
long argument lists can get their arguments
corrupted if the functions are
inlined during the LTO stage; avoiding that requires marking the functions
noinline. Andi complains "I wish there was a generic way to
handle this. Seems like a ticking time bomb problem." In general,
he acknowledges the possibility that LTO may introduce new,
optimization-related bugs into the kernel; finding all of those could be a
Then there is the requirement that all files be built with the same set of
options. Current kernels are not built that way; different options are
used in different parts of the tree. In some places, this problem can be
worked around by disabling specific optimizations that depend on different
compiler flags than are used in the rest of the kernel. In others, though,
features must simply be disabled to use
LTO. These include the "modversions" feature (allowing kernel modules to
be used with more than one kernel version) and the function tracer.
Modversions seems to be fixable; getting ftrace to work may require changes
to GCC, though.
It is also necessary, of course, to change the build system to use the GCC
LTO feature. As of this writing, one must have a current GCC release; it is
also necessary to install a development version of the binutils package for
LTO to work. Even a minimal kernel requires about 4GB of memory for the
LTO pass; an "allyesconfig" build could require as much as 9GB. Given
that, the use of 32-bit systems for LTO kernel builds is out of the
question; it is still possible, of course, to build a 32-bit kernel on a
64-bit system. The build will also take between two and four times as long
as it does without LTO. So developers are unlikely to make much use of LTO
for their own work, but it might be of interest to distributors and others
who are building production kernels.
The fact that most people will not want to do LTO builds actually poses a
bit of a problem. Given the potential for LTO to introduce subtle bugs,
due either to optimization-related misunderstandings or simple bugs in the
new LTO feature itself, widespread testing is clearly called for before LTO
is used for production kernels. But if developers and testers are
unwilling to do such heavyweight builds, that testing may be hard to come
by. That will make it harder to achieve the level of confidence that will
be needed before LTO-built kernels can be used in real-world settings.
Given the above challenges, the size of the patch set, and the ongoing
maintenance burden of keeping LTO working, one might well wonder if it is
all worth it. And that comes down entirely to the numbers: how much faster
does the kernel get when LTO is used? Hard numbers are not readily
available at this time; the LTO patch set is new and there are still a lot
of things to be fixed. Andi reports that
runs of the "hackbench" benchmark gain about 5%, while kernel builds don't
change much at all. Some networking benchmarks improve as much as 18%.
There are also some unspecified "minor regressions." The numbers are
rough, but Andi believes they are encouraging enough to justify further
work; he also expects the LTO implementation in GCC to improve over time.
Andi also suggests that, in the long term, LTO could help to improve the
quality of the kernel code base by eliminating the need to put inline
functions into include files.
All told, this is a patch set in a very early stage of development; it
seems unlikely to be proposed for merging into a near-term kernel, even as
an experimental feature. In the longer term, though, it could lead to
faster kernels; use of LTO in the kernel could also help to drive
improvements in the GCC implementation that would benefit all projects. So
it is an effort that is worth keeping an eye on.
Comments (44 posted)
In this edition of "ask a kernel developer", I answer a multi-part question
about kernel subsystem maintenance from a new maintainer. The workflow that I use to handle
patches in the USB subsystem is used as an example to hopefully provide a
guide for those who are new to the maintainer role.
As always, if you have unanswered questions
relating to technical or procedural issues in Linux kernel
development, ask them in the comment section, or email them directly to
me. I will try to get to them in another installment down the road.
I have some questions about what I am supposed to be doing at different points
of the release cycle. -rc1 and -rc2 are spelled out in Documentation/HOWTO,
and I have a decent idea that patches I accept should be smaller and fix more
critical bugs as the -rcX's roll out. The big question is what do I do with
all of the other patches that come at random times?
First off, thanks so much for agreeing to maintain a kernel subsystem. Without
maintainers like you, the Linux kernel development process would be much more
chaotic and hard to navigate. I will try to explain how I have set up my
development workflow and how I maintain the different subsystems I am in charge
of. That example can help you determine how you wish to manage your
own development trees, and how to handle incoming patches from developers.
To answer the question, yes, you will receive patches at any point in the
release cycle, but not all of them are applicable to be sent to Linus at all
points in time, depending on where we are in the release cycle. I'll go
into more detail below, but for now, realize that in my opinion you should
not require the
other developers to wait for different points in the release cycle, and,
instead, you should hold onto
patches and send them upstream when they are needed. I think it is the
maintainer's job to do the
How best do I organize my pull-request branches so that developers know
which they can pull as dependencies, and which are for-next. I don't want
to over-organize it, but do want to make it easy for board submitters to
test from my trees.
Should my pull-request branches be long-lived, or, kill them and create new
after each cycle?
It's best to stick with a simple scheme for branches, work with that for a
while, and then if you find that is too limiting, feel free to grow from there.
I only have two branches in my git trees, one to feed to Linus for the
current release cycle, and one that is for the next release cycle. This
can be seen in the USB
git tree on kernel.org, which shows three branches:
- master, which tracks Linus's tree
- usb-linus, which contains patches to go to Linus for this release cycle
- usb-next, which contains the patches to go to Linus for the next
Both of the usb-linus and usb-next branches are included in the nightly
linux-next releases as well. That gives me and the USB developers quick
feedback in case there are merge issues with other development trees or
if there are build issues on other architectures that I missed.
I receive patches from lots of different developers all the time. All
patches, after they pass an initial "is this sane" glance, get copied to
a mailbox that I call TODO. Every few days, depending on my workload,
I go through the mailbox and pick out all of the patches that are to be
applied to various trees I am responsible for. For this example, I'll
search on anything that touches the USB tree and copy those messages to
a temporary local mailbox on the filesystem called s (I name my local
for their ease of typing, not for any other good reason.)
After digging all of the USB patches out (which is really a simple
filter for all threads that have the "drivers/usb" string in them), I
take a closer look at the patches in the s mailbox.
First I look to find anything that would be applicable to Linus's
current tree. This is usually a bug fix for something that was
introduced during this merge window, or a regression for systems that
were previously working just fine. I pick those out and save them
to another temporary mailbox called s1.
Now it's time to start testing to see if the patches actually apply to
the tree. I go into a directory that contains my usb tree and check to
see what branch I am on:
$ cd linux/work/usb
$ git b
master 6dab7ed Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
* usb-linus 8f057d7 gpu/mfd/usb: Fix USB randconfig problems
usb-next 26f944b usb: hcd: use *resource_size_t* for specifying resource data
work-linus 8f057d7 gpu/mfd/usb: Fix USB randconfig problems
work-next 26f944b usb: hcd: use *resource_size_t* for specifying resource data
Note, I have the following aliases in my ~/.gitconfig
dc = describe --contains
fp = format-patch -k -M -N
b = branch -v
This enables me to use git b
to see the current branch much easier,
to format patches in the style I need them in, and git dc
determine exactly what release a specific git commit was contained in.
As you can see by the list of branches, I have a local branch that
mirrors the public versions of the usb-linus and usb-next branches
called work-linus and work-next. I do the testing and development work
in these local branches, and only when I feel they are "good enough" do
I push them to the public facing branches and then out to kernel.org.
So, back to work. As I am working on patches [1,
that are to be sent to
Linus first, let's change to the local working version of that branch:
$ git checkout work-linus
Switched to branch 'work-linus'
Then a quick sanity check to verify that the patches in s1
really will apply to this tree (sadly, they often do not.):
$ p1 < ../s1
patching file drivers/usb/core/endpoint.c
patching file drivers/usb/core/quirks.c
patching file drivers/usb/core/sysfs.c
Hunk #2 FAILED at 210.
1 out of 2 hunks FAILED -- saving rejects to file drivers/usb/core/sysfs.c.rej
patching file drivers/usb/storage/transport.c
patching file include/linux/usb/quirks.h
(Note, the 'p1' command is really:
patch -p1 -g1 --dry-run
that I set up in my
file years ago as I quickly got tired of typing the full thing out.)
Here is an example of patches that will not apply to the work-linus branch, but
it turns out that this was my fault. They were generated against the
and really should be queued up for the next merge window, not for this release.
So, let's switch back to the work-next branch, as that is where the patches
$ git checkout work-next
Switched to branch 'work-next'
And see if they apply there properly:
$ p1 < ../s1
patching file drivers/usb/core/endpoint.c
patching file drivers/usb/core/quirks.c
patching file drivers/usb/core/sysfs.c
patching file drivers/usb/storage/transport.c
patching file include/linux/usb/quirks.h
Then I look at the patches themselves again in my email client, and edit
anything that needs to be cleaned up. The changes could be in
the Subject, the body of
the patch, or any other things that need to be touched up. With developers who
send patches all the time, no changes generally need to be done in this
unfortunately, I end up editing this type of "metadata" all the time.
After the patches look clean, and I've done a review of them again to verify
that I don't notice anything strange or suspicious, I do one last sanity check
by running the checkpatch.pl tool:
$ ./scripts/checkpatch.pl ../s1
total: 0 errors, 0 warnings, 73 lines checked
../s1 has no obvious style problems and is ready for submission.
So all looks good, so let's apply them to the branch and see if the build works
$ git am -s ../s1
Applying: usb/endpoint: Set release callback in the struct device_type \
instead of in the device itself directly
Applying: usb: convert USB_QUIRK_RESET_MORPHS to USB_QUIRK_RESET
$ make -j8
If everything built, then it's time to test the patches. This
can range from installing the changed kernel and ensuring that everything still
works properly and the new modifications work as they say they should work, to
doing nothing more than verifying that the build didn't break if I do not have
the hardware that the changed driver controls.
After this, and everything looks sane, it's time to push the patches to the
public kernel.org repository, as well as notifying the developer that their
patch was applied to the tree and where they can find it. This I do with
script called do.sh that has grown over the years; it was
originally based on a script that Andrew
Morton uses to notify developers when he applies their patches. You can find a
copy of it and the rest of the helper scripts I use for kernel development
in my gregkh-linux GitHub
The script does the following:
- generates a patch for every changeset in the local branch that is not in the usb-next branch
- emails the developer that this patch has now been applied and where it can be found
- merges the branch to the local usb-next branch
- pushes the branch to the public git.kernel.org repository
- pushes the branch to a local backup server that is on write-only media
- switches back to the work-next branch
With that, I'm free to delete the s1
mailbox, and start all over
After this, people do sometimes find problems with patches that need to
be fixed up. But, since my trees are public, I can't rebase
them—otherwise any developer who had previously pulled my branches
messed up. Instead, I sometimes revert patches, or apply fix-up patches
on top of the current tree to resolve issues. It isn't the cleanest solution at
times, but it is better to do this than rebase a public tree, which is
something that no one should ever do.
Hopefully, this description gives you an idea how you can manage your
trees and the patches sent to you to make things easier for yourself,
the linux-next maintainer, and any developer who relies on your tree.
Comments (14 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>