|
|
Log in / Subscribe / Register

Kernel development

Brief items

Kernel release status

The current development kernel is 3.6-rc5, released on September 8. "So 3.6-rc5 is out there, and everything is looking fairly calm. Too calm, in fact, I'm waiting for the other shoe to drop, when Greg finally crawls his way out from under his mailbox after the kernel summit and kayaking. I suspect a few other developers may also have been quiet because of the kernel summit and related travel."

Stable updates: 3.2.29 was released on September 12 with the usual set of important fixes.

Comments (none posted)

Quotes of the week - the view from outside

This is not the linux-kernel mailing list; you do not get to be rude just because you feel grumpy, disagree with someone else's reasoning, or drank decaf by accident.
Bryan O'Sullivan

The legal rights to kernel code of course belong to the kernel developers, who are actively working to undermine enforcement. That's not in question here; as frustrating as it is, this is their legal right.

The moral rights to the code, on the other hand, belong to every member of the public who, if the GPL were being properly enforced on the kernel, would have the right to obtain and use this code to enable them to use previously-unsupported hardware with Linux.

Rich Felker

Comments (none posted)

Kernel development news

KS2012: Improving the maintainer model

By Michael Kerrisk
September 12, 2012

2012 Kernel Summit

Trond Myklebust led a discussion on day one of the 2012 Kernel Summit on how to improve the kernel maintainer model. He started with a comment from Thomas Gleixner that for 21 years the kernel development community has focused on scaling Linus, but has been rather slower in scaling the subsystem maintainer role. By this time, Linux is no longer a hobbyist project, and after 21 years it's probably time to focus more on scaling the maintainer role.

Trond noted that the kernel maintainer role is a mishmash that includes software architect, developer, reviewer, patch monkey, and software maintainer. In the context of a corporate project, these roles are typically held by multiple people. Trond noted that the maintainer role is to some extent already informally split, since we have reviewers, bug fixers, developers, and so on. However, he was interested to know whether it makes sense to give maintainers greater freedom to (formally) split out some of these roles, and if so, he requested that there should be a mechanism for formally recognizing this in the community (for example, via suitable annotation in the MAINTAINERS file).

One of the participants asked: what barriers keep somebody from taking on some of a maintainer's work? In response, Ted Ts'o indicated that there is no simple answer to that question, noting that kernel developers tend to set a higher bar for people with whom they have no history, whereas they set a lower bar (in terms of the kinds of changes that they permit) for people who have demonstrated a longer-term commitment to the code. "It's a human nature thing."

The conversation meandered over various topics. Along the way, Paul Gortmaker noted that the Documentation/SubmittingDrivers file could do with an update to align with current practices. Dave Jones noted that, likewise, REPORTING-BUGS could do with an update. In amongst the other discussion Linus noted that code that needs to go into two subsystems should be placed in a tree of its own that both subsystems can pull from, since the alternative (placing the code in one of the trees) create confusion when dealing with patches.

The discussion did not reach any definite conclusions about the maintainer role. However, it's clear that several maintainers are conscious that just as there was a need to improve Linus's scalability several years ago, the ever-increasing scale of the Linux kernel project means that now the subsystem-maintainer role could do with some scalability improvements of its own.

Comments (none posted)

KS2012: Stable kernel management

By Michael Kerrisk
September 12, 2012

2012 Kernel Summit

In a short session toward the close of day one of the 2012 Kernel Summit, Greg Kroah-Hartman, the maintainer of the stable kernel series, relayed one of his concerns about the stable kernel and sought questions and feedback from those present.

Greg stated that he had just one thing to complain about: subsystems that are not marking patches for stable. Here, Greg mentioned a few of those subsystems, and at the same time singled Dave Miller out for praise, noting that Dave was doing a lot of "heavy lifting" for networking. Greg then opened the session for feedback from others about stable kernel management.

Ted Ts'o noted "I'd love to be able to mark some less urgent patches as 'stable-deferred', so that if people discover regressions, I have a chance to pull them back." Greg said that that he would try to implement this functionality, as it is a good idea.

A few people wanted to understand more clearly the criteria that determine whether a patch should be sent for the stable series, and others noted that there seemed to be some latitude as to what Greg considered to be an acceptable patch. Greg acknowledged the latter point, with the statement that he trusted subsystem maintainers to make the call about what patches should be sent to stable@vger.kernel.org. As far as choosing which patches should be sent into stable, people were of course reminded of Documentation/stable_kernel_rules.txt and the summary rationale for stable: if the patch would be of interest for distributions aiming to produce a stable kernel for a distribution release, then that patch should be submitted to stable.

James Bottomley stated that he got a lot of patches for SCSI that don't apply to the stable kernel, so he strips the stable tag from them. He asked: "what should be done in that case?" Greg answered that he should leave that tag on, and then respond to the automated email he will get when the patch fails to apply to the stable kernel tree with the correct patch for that older kernel tree.

Greg concluded by asking whether the current release pace of the stable series was okay. There was general agreement that the pace—a release every one to two weeks—was good, and many people expressed appreciation for the excellent job Greg is doing on the stable kernel.

Comments (none posted)

KS2012: Improving tracing and debugging

By Michael Kerrisk
September 12, 2012

2012 Kernel Summit

Day one of the 2012 Kernel Summit saw a discussion on improving kernel tracing and debugging, led by Jason Wessel and Steven Rostedt. Jason's particular interest was how to get better tracing information from users who send in reports for kernel crashes.

Most of the session focused on Jason's proposal for kernel changes that would allow source line numbers to be displayed as part of the backtrace that is provided in the event of a kernel crash, so as to allow easier diagnosis of the source of the crash. The proposed technique is implemented by including ELF tables with the necessary symbol information in the compiled kernel. With Jason's patches, use of this feature is straightforward: the kernel is configured with CONFIG_KALLSYMS_LINE_LOCATIONS enabled and built with debugging information included. Once that is done, then events such as kernel panics will generate a call trace that includes source file names and line numbers:

    Call to panic() with the patch set
    ----------------------------------
    Call Trace:
     [<ffffffff815f3003>] panic+0xbd/0x14 panic.c:111
     [<ffffffff815f31f4>] ? printk+0x68/0xd printk.c:765
     [<ffffffffa0000175>] panic_write+0x25/0x30 [test_panic] test_panic.c:189
     [<ffffffff8118aa96>] proc_file_write+0x76/0x21 generic.c:226
     [<ffffffff8118aa20>] ? __proc_create+0x130/0x21 generic.c:211
     [<ffffffff81185678>] proc_reg_write+0x88/0x21 inode.c:218
     [<ffffffff81125718>] vfs_write+0xc8/0x20 read_write.c:435
     [<ffffffff811258d1>] sys_write+0x51/0x19 read_write.c:457
     [<ffffffff815f84d9>] ia32_do_call+0x13/0xc ia32entry.S:427

The improved call-tracing information that is provided by these patches would undoubtedly make life somewhat easier for diagnosing the causes of some kernel crashes. However, there is a cost: the memory footprint of the resulting kernel is much larger. During the session, a figure of 20 MB was mentioned, although in a mail that he later sent to the kernel summit discussion list, Jason clarified that the figure was more like 10 MB.

The large increase in kernel memory footprint that results from Jason's technique immediately generated some skepticism on its usefulness. As someone pointed out, such a large increase in kernel size would be unwelcome by users running kernels in cloud-based virtual machines such as Amazon EC2, where the available memory might be limited to (for example) 0.5 GB. Others suggested that it's probably possible to achieve the same result via a suitably built kernel that is loaded by kexec() in the event of a kernel crash. (However, there was some questioning of that idea also, since that technique might also carry a significant memory overhead.)

Linus then weighed in to argue against the proposal altogether. In his view, kernel panics are a small enough part of user bug reports that the cost of this approach is unwarranted; an overhead of something like 1 MB for the increase in memory footprint would be more reasonable, he thought. Linus further opined that one can, with some effort, obtain similar traceback information by loading the kernel into GDB.

Although Jason's proposed patches provide some helpful debugging functionality, the approach met enough negative response that it seems unlikely to be merged in anything like its current form. However, Jason may not be ready to completely give up on the idea yet. In his mail sent soon after the session, he hypothesized about some modifications to his approach that might bring the memory footprint of his feature down to something on the order of 5MB, as well as other approaches that could be employed so that the end user had greater control over when and if this feature was deployed for a running kernel. Thus, it may be that we'll see this idea reappear in a different form at a later date.

Comments (6 posted)

KS2012: Improving development processes: linux-next

By Michael Kerrisk
September 12, 2012

2012 Kernel Summit

The final session of day one of the 2012 Kernel Summit considered the linux-next tree and a possible complementary tree.

Steven Rostedt stated that he'd like to have a "linux-devel" tree, which would serve a similar purpose to that once served by Andrew Morton's "-mm" tree: it would be a place where reasonably stable code sits for a while for longer testing. He noted that such a tree might be useful for an API that hasn't yet stabilized, for example. Steven asked whether others would also be interested in something like this.

Chris Mason questioned whether such a tree could work in practice. "When your work and my work are together, people blame me for your bugs and vice versa." Based on experience with a similar approach in another project, Ben Herrenschmidt noted another problem: people started developing against that code base instead of the designated development base (i.e., the creation of a "linux-devel" might cause some people to develop against that tree instead of linux-next). Tony Luck noted that the value of a "linux-devel" tree would depend greatly on how much testing it received, and the sense was that such a tree would likely see less testing than linux-next, which itself could do with more testers.

Of course, even if a "linux-devel" tree was considered worthwhile, the tree would need a maintainer. In response to the question of how much work was required to maintain linux-next, the maintainer, Stephen Rothwell, said it required between four and ten hours per day, depending on the stage in the kernel-release cycle. In the end, as Steven Rostedt himself noted, the overall response to the proposal of a "linux-devel" tree was unenthusiastic.

Attention then briefly turned to the linux-next tree. Ted Ts'o asked: are people happy with how the tree was working? The overall consensus seemed to be that it was working well. H. Peter Anvin seemed to sum up the mood, in stating his overall contentment with linux-next while noting that "the imperfections of linux-next are reflections of the fact that it is a real-world creation". Ted asked in a tone that seemed to expect a negative answer, "does anyone run linux-next in anger on their development system?", and was a little surprised to see that quite a number of kernel developers indicated that they do eat their own dog food, living pretty much continuously on linux-next as the booted kernel on the work system that they use on a daily basis. After more than three years, it's clear that linux-next is by now an essential part of the kernel-development model.

Comments (2 posted)

KS2012: Kernel Summit feedback

By Jake Edge
September 12, 2012

2012 Kernel Summit

Ted Ts'o led the final session of this year's Kernel Summit (KS), which was targeted at discussing the summit itself. Over the years, there have been various changes to the format and this year was no exception. The summit was co-located with and overlapped one day of the Linux Plumbers Conference (LPC); the minisummits were moved into the middle of the summit as well. Ts'o and others wondered how well that worked and looked for input on how the meetings should be structured in the future.

Putting the minisummits on day two (Tuesday August 28) turned that day into an "all-day hallway track" for those who weren't participating, Ts'o said. That had both good and bad points, but was in general well-received. The all-day hallway track and minisummits both got a boost from the early arrival of LPC attendees.

The topic choices for day one were good, according to H. Peter Anvin and others. A little more notice of the schedule would have been useful, Anvin said, so that participants could prepare for the discussions. Mel Gorman said that the summit was "sedate" overall, though he thought the topics were well selected. It was not very "entertaining", though, because there wasn't any fighting. Christoph Hellwig noted that the people "we fight with" weren't invited.

James Bottomley wondered if it would have been better to have a "cage fight" on the first day over the two competing NUMA scheduling approaches. Linus Torvalds noted that some may have avoided the memcg minisummit (where that discussion took place), even though they were interested in NUMA scheduling, so they "didn't have to hear about memcg". But Gorman said that particular problem may have been best handled "relatively privately" in the smaller memory-management-focused group at the memcg minisummit. Opening the discussion up to larger participation might have "made a bad situation a hell of lot worse".

Torvalds had his own complaint about the minisummits: their schedules. He would rather have had shorter sessions, rather than all-day meetings, because it made it harder to switch between them. He sat in on the PCI minisummit but felt like he would have been coming into the middle of the ARM minisummit by switching to attend the AArch64 discussion. He would rather see two-hour pre-announced BoF-like sessions.

Ts'o said some of the minisummit schedules came out quite late, which left no time to negotiate changes to reduce conflicts. Hellwig said that what Torvalds was suggesting, perhaps, was the elimination of the minisummits and instead to roll those discussions into longer LPC sessions. That might mean that KS and LPC should always be combined, Bottomley said. But, Arnd Bergmann was not convinced that the influx of LPC attendees was helpful for the ARM minisummit, which was already too big, he said, and got overrun with the additional people.

Others saw few problems in the overlap with LPC, to the point where juxtaposing KS and LPC each year was discussed. One problem with that is that LPC is a North American conference, whereas KS moves around the globe. Next year, LPC will be co-located with LinuxCon in New Orleans, while KS will either be in Edinburgh with LinuxCon Europe or somewhere in Asia, possibly Hong Kong. But, it doesn't matter what the conference is called, Hellwig said, but that the format remains and the same types of attendees are present. Anvin cautioned against tying LPC to KS, noting that it can be bad for the other conference in the long run, citing the KS/Ottawa Linux Symposium combination as an example.

It might be possible to see if LPC had any interest in moving to locations outside of North America, or setting up meetings like LPC wherever KS is being held. Chris Mason noted that KS can be a draw for plumbing layer developers no matter where it is held. Dirk Hohndel thought that the same kind of KS/LPC meetings could be set up anywhere and draw in developers from afar as well as those nearby, noting that Korea or Japan would be good candidates. Ts'o agreed that these kinds of meetings bring new people into the community. He said that Hong Kong is under consideration to draw in more Chinese developers, for example.

While the co-location with LPC was seen to be mostly beneficial, the addition of LinuxCon and CloudOpen was a bit much. Those conferences started on Wednesday, which resulted in a large influx of people. That led to some confusion: the rooms where meetings had been held the previous two days were no longer available, it was unclear where to get the lunch available for KS attendees (and there was confusion over who was allowed to eat), and so on. Most in the room were not in favor of doing quite that much overlap in the future. Hohndel noted that the Linux Foundation staff were going "insane" trying to make it all work, so it is unlikely something like that will happen again.

In answer to a question from Bottomley, most present were in favor of moving the KS location each year, and there were suggestions of other possible venues down the road. Some were less likely (e.g. Cuba), while others seem quite possible (e.g. South America, Korea, or Japan again). Changing the usual (northern hemisphere) summer to fall dates for KS was discussed, but the logistics of moving to spring were considered difficult. It would have to be done in stages so that the distance between summits was kept to roughly a year. That also means, for example, that co-locating with linux.conf.au sometime (which was suggested) would be hard to do because it is held in January.

The largely minor complaints aside, the general sense from the discussion was that this year's summit had served its purpose. It got kernel hackers together to discuss areas where the kernel development process could be improved. There will undoubtedly be more tweaks to the format over the years, but the summit itself—like the kernel development process—is working pretty well.

Comments (none posted)

LPC: The realtime microconference

September 12, 2012

This article was contributed by Darren Hart

Thomas Gleixner (Linutronix) led the 2012 Linux Plumbers Realtime Microconference in San Diego this year. This session went from 9:00 AM until noon on Friday morning and continued the highly civilized tone prevalent across the sessions of the various co-located conferences this year.

Thomas took a moment while opening the session to reflect on the passing of Dr. Doug Niehaus and his contributions to real-time operating systems.

Paul E. McKenney (IBM) kicked things off with his presentation on "Getting RCU Further Out of the Way" (reducing the overhead of RCU). Introducing no-callbacks CPUs (no-CBs) allows the RCU callbacks as well as the grace period processing to be offloaded to other CPUs. The callbacks are queued to new lists which require atomic operations and memory barriers to allow for concurrent access (as they are no longer created and executed on the same CPU). The prototype is limited by requiring at least one CPU to handle RCU callbacks in order to wait for the grace period and ensure the callbacks are run. It supports both polling mode as well explicit wake-up by call_rcu(). Peter Zijlstra suggested offloading only the callback processing by leaving the grace period processing on each CPU and moving the callbacks to the atomic list. Paul acknowledged it to be a good intermediate step, but also indicated that offloading the grace period processing should not be overly difficult. Several people indicated interest in the improvements.

Steven Rostedt (Red Hat) presented on the challenges of working PREEMPT_RT into mainline in his presentation, "The Banes of Mainline for RT". He discussed interrupt handling in the PREEMPT_RT kernel, which has periodically swung back and forth between more and fewer threads in an attempt to balance lowest latency with lowest overhead as well as maintainability and upstream acceptance. Per-device interrupt handlers are considered ideal as they allow for finer control of priorities for handlers on all-too-common shared interrupt lines.

He also spent some time discussing common livelock scenarios from mainline that the PREEMPT_RT kernel has to work around. Firstly, the use of a nested trylock defeats priority boosting. The solution is to drop the conflicting locks, acquire the lock and immediately release it, then attempt the lock sequence again. This approach ensures the priority boosting takes place and the possible inversion becomes bounded. The practice of __do_softirq() raising its own softirq in the event of a failed spin_trylock() can lead to a livelock in PREEMPT_RT where ksoftirqd is run at realtime priority and all softirqs are run as preemptable threads. The solution here is to simply acquire the spinlock for PREEMPT_RT, where the spinlock is converted into a mutex, allowing the lock holder to be scheduled and complete its critical section. Steven threatened getting rid of softirqs entirely.

Peter Zijlstra (Red Hat) briefly discussed the SCHED_DEADLINE scheduler, including a request for would-be users to provide details of their use cases which he can use to justify the inclusion of the code upstream. While he is in favor of pushing the patches, apparently even Peter Zijlstra has to provide convincing evidence before pushing yet another scheduling policy into the mainline Linux kernel. It was reiterated that many media applications for Mac OSX use the Mac EDF scheduler. Juri Lelli noted that Scuola Superiore Sant'Anna has "toy" media players that use SCHED_DEADLINE. Contacting the JACK community was suggested as well. As SCHED_DEADLINE requires the application to specify its periodicity and duration, adoption may be slow. Fortunately, there are straightforward methods of determining these parameters.

Frank Rowand (Sony) prepared some material on the usage of PREEMPT_RT with an emphasis on the stable trees. He also presented some wishlist items collected from users. While Thomas produces patch tarballs for the development PREEMPT_RT tree, Steven currently releases stable PREEMPT_RT as a git branch. While interest remains for the git branches, Steven has agreed to also release the stable trees as patch tarballs (including previous releases). Some confusion was noted regarding the development process for the stable trees, such as which mailing lists to use, as well as the difference between Steven's stable branches and the "OSADL Latest Stable" releases. It was agreed to add this information in a README, and include that in-tree along with the PREEMPT_RT releases.

Some sort of issue tracker was requested. Darren Hart (Intel) and Clark Williams (Red Hat) agreed to work with bugzilla.kernel.org to get one setup for the PREEMPT_RT tree.

Frank continued to lead a lengthy discussion on the stable real-time release process. The two areas of concern were the amount of testing these trees receive and which versions to continue supporting (including the number of concurrent trees). While Steven's stable trees are widely used by numerous organizations, they do not receive much testing outside his machines before they are released. It should be noted, however, that any patches to his stable trees must have first spent some time in Thomas's development tree, and have therefore seen some testing before they are pulled in.

Carsten Emde's (OSADL) long-term load systems, on the other hand, perform sequential long-running cyclictest runs, one of which recently completed one year of uptime without violating real-time constraints in 160 billion cycles. Carsten, who was not present, defines the OSADL Latest Stable criteria as: "all our development systems in the QA Farm must be running this kernel for at least a month under all appropriate load scenarios without any problem." Thomas has agreed to add Steven's trees to his automated testing to help improve the level of testing of the stable releases.

As for the longevity of the long-term stable releases (3.0, 3.2, 3.4, ...) Steven drops support for a stable release when Greg Kroah-Hartman does. Ben Hutchings's stable tree appears to be the sole exception. Steven will continue to support one Hutchings stable tree at a time, so long as time permits.

Darren brought up longer term support and alignment with the Long-Term Support Initiative as the Yocto Project supports both PREEMPT_RT as well as LTSI, and alignment here significantly reduces the support effort. If an LTSI PREEMPT_RT tree is to be maintained, someone will need to volunteer for the task. Darren indicated the Yocto Project is likely to do so.

Following Frank, Luis Claudio Goncalves (Red Hat) discussed the joys of working with customers on real-world use-cases. Customers often push the boundaries of what has been tested by running on much larger systems than you might expect. They also frequently run into performance issues or livelocks with programming mechanisms that more or less work in mainline, but definitely do not when running with real-time priority. CPU-bound threads, including busy loops to avoid latency via polling, can starve the system when run as a high-priority realtime threads, resulting in large latency spikes or system freezes. Running a single process with 1000 threads leads to heavy contention on mmap_sem; large changes would be required for PREEMPT_RT to deal with this scenario well.

The concept of CPU isolation has its own pitfalls. An "isolated" CPU still runs several kernel threads and requires housekeeping, while users may expect the CPU to run only the isolated application. In these scenarios, the application is commonly set to SCHED_FIFO at priority 99, resulting in severe latencies and system freezes as the lower priority kernel threads are prevented from running. A list of the current work that must be performed by the kernel on an isolated CPU is documented in Gilad Ben-Yossef's Linux wiki. Some of the issues listed there have been fixed or at least minimized. Paul's presentation addressed two of them. Additionally, Luis has volunteered to work on some best practices documentation, but has asked for people to help review.

In closing, Thomas noted that there would not be a 3.5-rt release, but that he would be looking at 3.6 for the next PREEMPT_RT release. The further refinement of IRQ handling was mentioned as one of the most noteworthy changes planned for the 3.6-rt release.

Thanks to all the presenters, participants, and reviewers, as well as Paul E. McKenney, whose notes helped to show that two note-takers are better than one.

Comments (1 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 3.6-rc5 ?
Ben Hutchings Linux 3.2.29 ?

Architecture-specific

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management

Security-related

Miscellaneous

Lucas De Marchi kmod 10 ?

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds