The current development kernel is 3.6-rc5
on September 8. "So
3.6-rc5 is out there, and everything is looking fairly calm. Too calm, in
fact, I'm waiting for the other shoe to drop, when Greg finally crawls his
way out from under his mailbox after the kernel summit and kayaking. I
suspect a few other developers may also have been quiet because of the
kernel summit and related travel.
Stable updates: 3.2.29 was released
on September 12 with the usual set of important fixes.
Comments (none posted)
This is not the linux-kernel mailing list; you do not get to be
rude just because you feel grumpy, disagree with someone else's
reasoning, or drank decaf by accident.
— Bryan O'Sullivan
The legal rights to kernel code of course belong to the kernel
developers, who are actively working to undermine
enforcement. That's not in question here; as frustrating as it is,
this is their legal right.
The moral rights to the code, on the other hand, belong to every
member of the public who, if the GPL were being properly enforced
on the kernel, would have the right to obtain and use this code to
enable them to use previously-unsupported hardware with Linux.
— Rich Felker
Comments (none posted)
Kernel development news
Trond Myklebust led a discussion on day one of the 2012 Kernel Summit on how
to improve the kernel maintainer model. He started with a comment from
Thomas Gleixner that for 21 years the kernel development community has
focused on scaling Linus, but has been rather slower in scaling the
subsystem maintainer role. By this time, Linux is no longer a hobbyist
project, and after 21 years it's probably time to focus more on scaling the
Trond noted that the kernel maintainer role is a mishmash that includes
software architect, developer, reviewer, patch monkey, and software
maintainer. In the context of a corporate project, these roles are
typically held by multiple people. Trond noted that the maintainer role is
to some extent already informally split, since we have reviewers, bug
fixers, developers, and so on. However, he was interested to know whether
it makes sense to give maintainers greater freedom to (formally) split out
some of these roles, and if so, he requested that there should be a
mechanism for formally recognizing this in the community (for example, via
suitable annotation in the MAINTAINERS file).
One of the participants asked: what barriers keep somebody from taking on some of a maintainer's work? In response, Ted Ts'o indicated that there is no simple answer to that question, noting that kernel developers tend to set a higher bar for people with whom they have no history, whereas they set a lower bar (in terms of the kinds of changes that they permit) for people who have demonstrated a longer-term commitment to the code. "It's a human nature thing."
The conversation meandered over various topics. Along the way, Paul
Gortmaker noted that the Documentation/SubmittingDrivers file
could do with an update to align with current practices. Dave Jones noted
that, likewise, REPORTING-BUGS could do with an update. In amongst
the other discussion Linus noted that code that needs to go into two
subsystems should be placed in a tree of its own that both subsystems can
pull from, since the alternative (placing the code in one of the trees)
create confusion when dealing with patches.
The discussion did not reach any definite conclusions about the
maintainer role. However, it's clear that several maintainers are
conscious that just as there was a need to improve Linus's scalability
several years ago, the ever-increasing scale of the Linux kernel project
means that now the subsystem-maintainer role could do with some scalability
improvements of its own.
Comments (none posted)
In a short session toward the close of day one of the 2012 Kernel Summit,
Greg Kroah-Hartman, the maintainer of the stable kernel series, relayed one
of his concerns about the stable kernel and sought questions and feedback
from those present.
Greg stated that he had just one thing to complain about: subsystems
that are not marking patches for stable. Here, Greg mentioned a few of
those subsystems, and at the same time singled Dave Miller out for praise,
noting that Dave was doing a lot of "heavy lifting" for networking. Greg
then opened the session for feedback from others about stable kernel
Ted Ts'o noted "I'd love to be able to mark some less urgent
patches as 'stable-deferred', so that if people discover regressions, I
have a chance to pull them back." Greg said that that he would
try to implement this functionality, as it is a good idea.
A few people wanted to understand more clearly the criteria that
determine whether a patch should be sent for the stable series, and others
noted that there seemed to be some latitude as to what Greg considered to
be an acceptable patch. Greg acknowledged the latter point, with the
statement that he trusted subsystem maintainers to make the call about what
patches should be sent to email@example.com. As far as
choosing which patches should be sent into stable, people were of
course reminded of Documentation/stable_kernel_rules.txt
and the summary rationale for stable: if the patch would be of
interest for distributions aiming to produce a stable kernel for a
distribution release, then that patch should be submitted to
James Bottomley stated that he got a lot of patches for SCSI that don't
apply to the stable kernel, so he strips the stable tag from them. He
asked: "what should be done in that case?" Greg answered that
he should leave that tag on, and then respond to the automated email he
will get when the patch fails to apply to the stable kernel tree with the
correct patch for that older kernel tree.
Greg concluded by asking whether the current release pace of the stable
series was okay. There was general agreement that the pace—a release
every one to two weeks—was good, and many people expressed
appreciation for the excellent job Greg is doing on the stable kernel.
Comments (none posted)
Day one of the 2012 Kernel Summit saw a discussion on improving kernel
tracing and debugging, led by Jason Wessel and Steven Rostedt. Jason's
particular interest was how to get better tracing information from users
who send in reports for kernel crashes.
Most of the session focused on Jason's proposal for kernel changes
that would allow source line numbers to be displayed as part of the
backtrace that is provided in the event of a kernel crash, so as to allow
easier diagnosis of the source of the crash. The proposed technique is
implemented by including ELF tables with the necessary symbol information
in the compiled kernel. With Jason's patches, use of this feature is
straightforward: the kernel is configured with
CONFIG_KALLSYMS_LINE_LOCATIONS enabled and built with debugging
information included. Once that is done, then events such as kernel panics
will generate a call trace that includes source
file names and line numbers:
Call to panic() with the patch set
[<ffffffff815f3003>] panic+0xbd/0x14 panic.c:111
[<ffffffff815f31f4>] ? printk+0x68/0xd printk.c:765
[<ffffffffa0000175>] panic_write+0x25/0x30 [test_panic] test_panic.c:189
[<ffffffff8118aa96>] proc_file_write+0x76/0x21 generic.c:226
[<ffffffff8118aa20>] ? __proc_create+0x130/0x21 generic.c:211
[<ffffffff81185678>] proc_reg_write+0x88/0x21 inode.c:218
[<ffffffff81125718>] vfs_write+0xc8/0x20 read_write.c:435
[<ffffffff811258d1>] sys_write+0x51/0x19 read_write.c:457
[<ffffffff815f84d9>] ia32_do_call+0x13/0xc ia32entry.S:427
The improved call-tracing information that is provided by these patches
would undoubtedly make life somewhat easier for diagnosing the causes of
some kernel crashes. However, there is a cost: the memory footprint of the
resulting kernel is much larger. During the session, a figure of 20 MB was
mentioned, although in a mail that he later sent
to the kernel summit discussion list, Jason clarified that the figure
was more like 10 MB.
The large increase in kernel memory footprint that results from Jason's
technique immediately generated some skepticism on its usefulness. As
someone pointed out, such a large increase in kernel size would be
unwelcome by users running kernels in cloud-based virtual machines such as
Amazon EC2, where the available memory might be limited to (for example)
0.5 GB. Others suggested that it's probably possible to achieve the same
result via a suitably built kernel that is loaded by kexec() in
the event of a kernel crash. (However, there was some questioning of that
idea also, since that technique might also carry a significant memory
Linus then weighed in to argue against the proposal altogether. In his
view, kernel panics are a small enough part of user bug reports that the
cost of this approach is unwarranted; an overhead of something like 1 MB
for the increase in memory footprint would be more reasonable, he
thought. Linus further opined that one can, with some effort, obtain
similar traceback information by loading the kernel into GDB.
Although Jason's proposed patches provide some helpful debugging
functionality, the approach met enough negative response that it seems
unlikely to be merged in anything like its current form. However, Jason may
not be ready to completely give up on the idea yet. In his mail sent soon
after the session, he hypothesized about some modifications to his approach
that might bring the memory footprint of his feature down to something on
the order of 5MB, as well as other approaches that could be employed so
that the end user had greater control over when and if this feature was
deployed for a running kernel. Thus, it may be that we'll see this idea
reappear in a different form at a later date.
Comments (6 posted)
The final session of day one of the 2012 Kernel Summit considered
the linux-next tree and a possible complementary tree.
Steven Rostedt stated that he'd like to have a "linux-devel" tree,
which would serve a similar purpose to that once served by Andrew Morton's
"-mm" tree: it would be a place where reasonably stable code sits for a
while for longer testing. He noted that such a tree might be useful for an
API that hasn't yet stabilized, for example. Steven asked whether others
would also be interested in something like this.
Chris Mason questioned whether such a tree could work in
practice. "When your work and my work are together, people blame me
for your bugs and vice versa." Based on experience with a similar
approach in another project, Ben Herrenschmidt noted another problem: people
started developing against that code base instead of the designated
development base (i.e., the creation of a "linux-devel" might cause some
people to develop against that tree instead of linux-next). Tony
Luck noted that the value of a "linux-devel" tree would depend greatly on
how much testing it received, and the sense was that such a tree would
likely see less testing than linux-next, which itself could do
with more testers.
Of course, even if a "linux-devel" tree was considered
worthwhile, the tree would need a maintainer. In response to the question
of how much work was required to maintain linux-next, the
maintainer, Stephen Rothwell, said it required between four and ten hours
per day, depending on the stage in the kernel-release cycle.
In the end, as Steven Rostedt himself noted, the overall response to the
proposal of a "linux-devel" tree was unenthusiastic.
Attention then briefly turned to the linux-next tree. Ted Ts'o
asked: are people happy with how the tree was working? The overall
consensus seemed to be that it was working well. H. Peter Anvin seemed
to sum up the mood, in stating his overall contentment with
linux-next while noting that "the imperfections of
linux-next are reflections of the fact that it is a real-world
Ted asked in a tone that seemed to expect a negative
answer, "does anyone run linux-next in anger on their
development system?", and was a little surprised to see that quite a
number of kernel developers indicated that they do eat their own dog
food, living pretty much continuously on linux-next as the booted kernel
on the work system that they use on a daily basis.
After more than three years, it's clear that
linux-next is by now an essential part of the kernel-development
Comments (2 posted)
Ted Ts'o led the final session of this year's Kernel Summit (KS), which was
targeted at discussing the
summit itself. Over the years, there have been various changes to the
format and this year was no exception. The summit was co-located with and
overlapped one day of the Linux Plumbers Conference (LPC); the minisummits were
moved into the middle of the summit as well. Ts'o and others wondered how
well that worked and looked for input on how the meetings should be
structured in the future.
Putting the minisummits on day two (Tuesday August 28) turned that day into
an "all-day hallway track" for those who weren't participating, Ts'o said.
both good and bad points, but was in general well-received. The all-day
hallway track and minisummits both got a boost from the early arrival of LPC
The topic choices for day one were good, according to H. Peter Anvin and
others. A little more notice of the schedule would have been useful, Anvin
said, so that participants could prepare for the discussions. Mel Gorman
said that the summit was "sedate" overall, though he thought the topics
were well selected. It was not very "entertaining", though, because there
wasn't any fighting. Christoph Hellwig noted that the people "we fight
with" weren't invited.
James Bottomley wondered if it would have been better to have a "cage
fight" on the first day over the two competing NUMA scheduling approaches.
Linus Torvalds noted that some may have avoided the memcg minisummit (where that
discussion took place), even though they were interested in NUMA
scheduling, so they "didn't have to hear about memcg". But Gorman said
that particular problem may have been best handled "relatively privately"
in the smaller
memory-management-focused group at the memcg minisummit. Opening the
discussion up to larger participation might have "made a bad situation a
hell of lot worse".
Torvalds had his own complaint about the minisummits: their
schedules. He would rather have had shorter sessions, rather than all-day
meetings, because it made it harder to switch between them. He sat in on
the PCI minisummit but felt like he would have been coming into the middle
of the ARM minisummit by switching to attend the AArch64 discussion. He would rather see
two-hour pre-announced BoF-like sessions.
Ts'o said some of the minisummit schedules came out quite late, which left
no time to negotiate changes to reduce conflicts. Hellwig said
that what Torvalds was suggesting, perhaps, was the elimination of the
minisummits and instead to roll those discussions into longer LPC
sessions. That might mean that KS and LPC should always be combined,
Bottomley said. But, Arnd Bergmann was not convinced that the influx of LPC
was helpful for the ARM minisummit, which was already too big, he said, and
got overrun with the additional people.
Others saw few problems in the overlap with LPC, to the point where
juxtaposing KS and LPC each year was discussed. One problem with that is
that LPC is a North American conference, whereas KS
moves around the
globe. Next year, LPC will be co-located with LinuxCon in New Orleans,
while KS will either be in Edinburgh with LinuxCon Europe or somewhere in
Asia, possibly Hong Kong. But, it doesn't matter what the conference is
called, Hellwig said, but that the format remains and the same types of
attendees are present.
Anvin cautioned against tying LPC to KS, noting that it can be
bad for the other conference in the long run, citing the KS/Ottawa Linux
combination as an example.
It might be possible to see if LPC had any interest in moving to locations
outside of North America, or setting up meetings like LPC wherever KS is
being held. Chris Mason noted that KS can be a draw for plumbing layer
matter where it is held. Dirk Hohndel thought that the same kind of KS/LPC
meetings could be set up anywhere and draw in developers from afar as well
as those nearby, noting that Korea or Japan would be good candidates. Ts'o
agreed that these kinds of meetings bring new people into the community. He
said that Hong Kong is under consideration to draw in more Chinese
developers, for example.
While the co-location with LPC was seen to be mostly beneficial, the
addition of LinuxCon and CloudOpen was a bit much. Those conferences started on
Wednesday, which resulted in a large influx of people. That led to some
confusion: the rooms where meetings
had been held the previous two days were no longer available, it was
unclear where to get
the lunch available for KS attendees (and there was confusion over who was
eat), and so on. Most in the room were not in favor of doing quite that
much overlap in the future. Hohndel noted that the Linux Foundation staff
were going "insane" trying to make it all work, so it is unlikely something
like that will happen again.
In answer to a question from Bottomley,
most present were in favor of moving the KS location each year,
and there were suggestions of other possible venues down the road. Some
were less likely (e.g. Cuba), while others seem quite possible (e.g. South
or Japan again). Changing the usual (northern hemisphere) summer to fall
dates for KS was discussed, but the logistics of moving to spring were
considered difficult. It would have to be done in stages so that the
distance between summits was kept to roughly a year. That also means, for
example, that co-locating with linux.conf.au sometime (which was suggested)
would be hard to do because it is held in January.
The largely minor complaints aside, the general sense from the discussion
was that this year's summit had served its purpose. It got kernel hackers
together to discuss areas where the kernel development process could be
improved. There will undoubtedly be more tweaks to the format over the
years, but the summit itself—like the kernel development
process—is working pretty well.
Comments (none posted)
Thomas Gleixner (Linutronix) led the 2012 Linux Plumbers Realtime
in San Diego this year. This session went from 9:00 AM until
noon on Friday morning and continued the highly civilized tone prevalent across
the sessions of the various co-located conferences this year.
Thomas took a moment while opening the session to reflect on the passing of Dr.
Doug Niehaus and his contributions to real-time operating systems.
Paul E. McKenney (IBM) kicked things off with his presentation on "Getting RCU
Further Out of the Way" (reducing the overhead of RCU). Introducing no-callbacks
CPUs (no-CBs) allows the RCU callbacks as well as the grace period processing to
be offloaded to other CPUs. The callbacks are queued to new lists which require
atomic operations and memory barriers to allow for concurrent access (as they
are no longer created and executed on the same CPU). The prototype is limited by
requiring at least one CPU to handle RCU callbacks in order to wait for the grace
period and ensure the callbacks are run. It supports both polling mode as well
explicit wake-up by call_rcu(). Peter Zijlstra suggested offloading only the
callback processing by leaving the grace period processing on each CPU and
moving the callbacks to the atomic list. Paul acknowledged it to be a good
intermediate step, but also indicated that offloading the grace period
processing should not be overly difficult. Several people indicated interest in
Steven Rostedt (Red Hat) presented on the challenges of working PREEMPT_RT into
mainline in his presentation, "The Banes of Mainline for RT". He discussed
interrupt handling in the PREEMPT_RT kernel, which has periodically swung back
and forth between more and fewer threads in an attempt to balance lowest latency
with lowest overhead as well as maintainability and upstream acceptance.
Per-device interrupt handlers are considered ideal as they allow for finer
control of priorities for handlers on all-too-common shared interrupt
also spent some time discussing common livelock scenarios from mainline that the
PREEMPT_RT kernel has to work around. Firstly, the use of a nested trylock
defeats priority boosting. The solution is to drop the conflicting locks,
acquire the lock and immediately release it, then attempt the lock sequence
again. This approach ensures the priority boosting takes place and the possible
inversion becomes bounded. The practice of __do_softirq() raising its own
softirq in the event of a failed spin_trylock() can lead to a livelock in
PREEMPT_RT where ksoftirqd is run at realtime priority and all
softirqs are run
as preemptable threads. The solution here is to simply acquire the spinlock for
PREEMPT_RT, where the spinlock is converted into a mutex, allowing the lock
holder to be scheduled and complete its critical section. Steven threatened
getting rid of softirqs entirely.
Peter Zijlstra (Red Hat) briefly discussed the SCHED_DEADLINE scheduler,
including a request for would-be users to provide details of their use cases
which he can use to justify the inclusion of the code upstream. While he is in
favor of pushing the patches, apparently even Peter Zijlstra has to provide
convincing evidence before pushing yet another scheduling policy into the
mainline Linux kernel. It was reiterated that many media applications for Mac
OSX use the Mac EDF scheduler. Juri Lelli noted that Scuola Superiore Sant'Anna
has "toy" media players that use SCHED_DEADLINE. Contacting the JACK
community was suggested as well. As SCHED_DEADLINE requires the application to
specify its periodicity and duration, adoption may be slow. Fortunately, there
are straightforward methods of determining these parameters.
Frank Rowand (Sony) prepared some material on the usage of PREEMPT_RT with an
emphasis on the stable trees. He also presented some wishlist items collected
from users. While Thomas produces patch tarballs for the development PREEMPT_RT
tree, Steven currently releases stable PREEMPT_RT as a git branch. While
interest remains for the git branches, Steven has agreed to also release the
stable trees as patch tarballs (including previous releases). Some confusion was
noted regarding the development process for the stable trees, such as which
mailing lists to use, as well as the difference between Steven's stable branches
and the "OSADL Latest Stable" releases. It was agreed to add this information in
a README, and include that in-tree along with the PREEMPT_RT releases.
Some sort of issue tracker was requested. Darren Hart (Intel) and Clark Williams
(Red Hat) agreed to work with bugzilla.kernel.org to get one setup for the
Frank continued to lead a lengthy discussion on the stable real-time release
process. The two areas of concern were the amount of testing these trees receive
and which versions to continue supporting (including the number of concurrent
trees). While Steven's stable trees are widely used by numerous organizations,
they do not receive much testing outside his machines before they are released.
It should be noted, however, that any patches to his stable trees must have
first spent some time in Thomas's development tree, and have therefore seen some
testing before they are pulled in.
Carsten Emde's (OSADL) long-term load
systems, on the other hand, perform sequential long-running cyclictest runs, one
of which recently completed one year of uptime without violating real-time
constraints in 160 billion cycles. Carsten, who was not present, defines the
OSADL Latest Stable criteria as: "all our development systems in the QA Farm
must be running this kernel for at least a month under all appropriate load
scenarios without any problem." Thomas has agreed to add Steven's trees to his
automated testing to help improve the level of testing of the stable releases.
As for the longevity of the long-term stable releases (3.0, 3.2, 3.4, ...)
Steven drops support for a stable release when Greg Kroah-Hartman does. Ben
stable tree appears to be the sole exception. Steven will continue to support
one Hutchings stable tree at a time, so long as time permits.
Darren brought up longer term support and alignment with the Long-Term Support
Initiative as the Yocto
Project supports both PREEMPT_RT as well as LTSI, and alignment here
significantly reduces the support effort. If an LTSI PREEMPT_RT tree is to be
maintained, someone will need to volunteer for the task. Darren indicated the
Yocto Project is likely to do so.
Following Frank, Luis Claudio Goncalves (Red Hat) discussed the joys of working
with customers on real-world use-cases. Customers often push the boundaries of
what has been tested by running on much larger systems than you might expect.
They also frequently run into performance issues or livelocks with programming
mechanisms that more or less work in mainline, but definitely do not when
running with real-time priority. CPU-bound threads, including busy loops to
avoid latency via polling, can starve the system when run as a high-priority
realtime threads, resulting in large latency spikes or system freezes. Running
a single process with 1000 threads leads to heavy contention on
changes would be required for PREEMPT_RT to deal with this scenario well.
concept of CPU isolation has its own pitfalls. An "isolated" CPU still runs
several kernel threads and requires housekeeping, while users may expect the CPU
to run only the isolated application. In these scenarios, the application is
commonly set to SCHED_FIFO at priority 99, resulting in severe latencies and
system freezes as the lower priority kernel threads are prevented from running. A
list of the current work that must be performed by the kernel on an isolated CPU
is documented in Gilad
Ben-Yossef's Linux wiki. Some of the issues listed
there have been fixed or at least minimized. Paul's presentation addressed two
of them. Additionally, Luis has volunteered to work on some best practices
documentation, but has asked for people to help review.
In closing, Thomas noted that there would not be a 3.5-rt release, but that he
would be looking at 3.6 for the next PREEMPT_RT release. The further refinement
of IRQ handling was mentioned as one of the most noteworthy changes planned for
the 3.6-rt release.
Thanks to all the presenters, participants, and reviewers, as well as Paul E.
McKenney, whose notes helped to show that two note-takers are better than one.
Comments (1 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
- Lucas De Marchi: kmod 10 .
(September 7, 2012)
Page editor: Jonathan Corbet
Next page: Distributions>>