LPC: The realtime microconference

September 12, 2012

This article was contributed by Darren Hart

Thomas Gleixner (Linutronix) led the 2012 Linux Plumbers Realtime Microconference in San Diego this year. This session went from 9:00 AM until noon on Friday morning and continued the highly civilized tone prevalent across the sessions of the various co-located conferences this year.

Thomas took a moment while opening the session to reflect on the passing of Dr. Doug Niehaus and his contributions to real-time operating systems.

Paul E. McKenney (IBM) kicked things off with his presentation on "Getting RCU Further Out of the Way" (reducing the overhead of RCU). Introducing no-callbacks CPUs (no-CBs) allows the RCU callbacks as well as the grace period processing to be offloaded to other CPUs. The callbacks are queued to new lists which require atomic operations and memory barriers to allow for concurrent access (as they are no longer created and executed on the same CPU). The prototype is limited by requiring at least one CPU to handle RCU callbacks in order to wait for the grace period and ensure the callbacks are run. It supports both polling mode as well explicit wake-up by call_rcu(). Peter Zijlstra suggested offloading only the callback processing by leaving the grace period processing on each CPU and moving the callbacks to the atomic list. Paul acknowledged it to be a good intermediate step, but also indicated that offloading the grace period processing should not be overly difficult. Several people indicated interest in the improvements.

Steven Rostedt (Red Hat) presented on the challenges of working PREEMPT_RT into mainline in his presentation, "The Banes of Mainline for RT". He discussed interrupt handling in the PREEMPT_RT kernel, which has periodically swung back and forth between more and fewer threads in an attempt to balance lowest latency with lowest overhead as well as maintainability and upstream acceptance. Per-device interrupt handlers are considered ideal as they allow for finer control of priorities for handlers on all-too-common shared interrupt lines.

He also spent some time discussing common livelock scenarios from mainline that the PREEMPT_RT kernel has to work around. Firstly, the use of a nested trylock defeats priority boosting. The solution is to drop the conflicting locks, acquire the lock and immediately release it, then attempt the lock sequence again. This approach ensures the priority boosting takes place and the possible inversion becomes bounded. The practice of __do_softirq() raising its own softirq in the event of a failed spin_trylock() can lead to a livelock in PREEMPT_RT where ksoftirqd is run at realtime priority and all softirqs are run as preemptable threads. The solution here is to simply acquire the spinlock for PREEMPT_RT, where the spinlock is converted into a mutex, allowing the lock holder to be scheduled and complete its critical section. Steven threatened getting rid of softirqs entirely.

Peter Zijlstra (Red Hat) briefly discussed the SCHED_DEADLINE scheduler, including a request for would-be users to provide details of their use cases which he can use to justify the inclusion of the code upstream. While he is in favor of pushing the patches, apparently even Peter Zijlstra has to provide convincing evidence before pushing yet another scheduling policy into the mainline Linux kernel. It was reiterated that many media applications for Mac OSX use the Mac EDF scheduler. Juri Lelli noted that Scuola Superiore Sant'Anna has "toy" media players that use SCHED_DEADLINE. Contacting the JACK community was suggested as well. As SCHED_DEADLINE requires the application to specify its periodicity and duration, adoption may be slow. Fortunately, there are straightforward methods of determining these parameters.

Frank Rowand (Sony) prepared some material on the usage of PREEMPT_RT with an emphasis on the stable trees. He also presented some wishlist items collected from users. While Thomas produces patch tarballs for the development PREEMPT_RT tree, Steven currently releases stable PREEMPT_RT as a git branch. While interest remains for the git branches, Steven has agreed to also release the stable trees as patch tarballs (including previous releases). Some confusion was noted regarding the development process for the stable trees, such as which mailing lists to use, as well as the difference between Steven's stable branches and the "OSADL Latest Stable" releases. It was agreed to add this information in a README, and include that in-tree along with the PREEMPT_RT releases.

Some sort of issue tracker was requested. Darren Hart (Intel) and Clark Williams (Red Hat) agreed to work with bugzilla.kernel.org to get one setup for the PREEMPT_RT tree.

Frank continued to lead a lengthy discussion on the stable real-time release process. The two areas of concern were the amount of testing these trees receive and which versions to continue supporting (including the number of concurrent trees). While Steven's stable trees are widely used by numerous organizations, they do not receive much testing outside his machines before they are released. It should be noted, however, that any patches to his stable trees must have first spent some time in Thomas's development tree, and have therefore seen some testing before they are pulled in.

Carsten Emde's (OSADL) long-term load systems, on the other hand, perform sequential long-running cyclictest runs, one of which recently completed one year of uptime without violating real-time constraints in 160 billion cycles. Carsten, who was not present, defines the OSADL Latest Stable criteria as: "all our development systems in the QA Farm must be running this kernel for at least a month under all appropriate load scenarios without any problem." Thomas has agreed to add Steven's trees to his automated testing to help improve the level of testing of the stable releases.

As for the longevity of the long-term stable releases (3.0, 3.2, 3.4, ...) Steven drops support for a stable release when Greg Kroah-Hartman does. Ben Hutchings's stable tree appears to be the sole exception. Steven will continue to support one Hutchings stable tree at a time, so long as time permits.

Darren brought up longer term support and alignment with the Long-Term Support Initiative as the Yocto Project supports both PREEMPT_RT as well as LTSI, and alignment here significantly reduces the support effort. If an LTSI PREEMPT_RT tree is to be maintained, someone will need to volunteer for the task. Darren indicated the Yocto Project is likely to do so.

Following Frank, Luis Claudio Goncalves (Red Hat) discussed the joys of working with customers on real-world use-cases. Customers often push the boundaries of what has been tested by running on much larger systems than you might expect. They also frequently run into performance issues or livelocks with programming mechanisms that more or less work in mainline, but definitely do not when running with real-time priority. CPU-bound threads, including busy loops to avoid latency via polling, can starve the system when run as a high-priority realtime threads, resulting in large latency spikes or system freezes. Running a single process with 1000 threads leads to heavy contention on mmap_sem; large changes would be required for PREEMPT_RT to deal with this scenario well.

The concept of CPU isolation has its own pitfalls. An "isolated" CPU still runs several kernel threads and requires housekeeping, while users may expect the CPU to run only the isolated application. In these scenarios, the application is commonly set to SCHED_FIFO at priority 99, resulting in severe latencies and system freezes as the lower priority kernel threads are prevented from running. A list of the current work that must be performed by the kernel on an isolated CPU is documented in Gilad Ben-Yossef's Linux wiki. Some of the issues listed there have been fixed or at least minimized. Paul's presentation addressed two of them. Additionally, Luis has volunteered to work on some best practices documentation, but has asked for people to help review.

In closing, Thomas noted that there would not be a 3.5-rt release, but that he would be looking at 3.6 for the next PREEMPT_RT release. The further refinement of IRQ handling was mentioned as one of the most noteworthy changes planned for the 3.6-rt release.

Thanks to all the presenters, participants, and reviewers, as well as Paul E. McKenney, whose notes helped to show that two note-takers are better than one.

Index entries for this article
GuestArticles	Hart, Darren
Conference	Linux Plumbers Conference/2012

Why OSADL's "Latest Stable"?

Posted Sep 13, 2012 7:14 UTC (Thu) by cemde (subscriber, #51020) [Link]

Hi Darren,

thanks a lot for writing this report! In case someone still is asking "Why is OSADL's Latest Stable needed in addition to Steven's releases?", I would explain it with a comparison of the mainline vs the PREEMPT_RT release strategy.

Mainline:
A kernel with the -rcX suffix is not yet released; a kernel without this suffix is released. People then consider a released kernel as stable (although they may need to apply subsequent stable release patches).

PREEMPT_RT:
When adaption of the PREEMPT_RT patches starts for a given kernel, the -rtX suffix is appended to its version, e.g. -rt1, but this only says that the patches are applied. The -rtX number then continuously increases and anywhere between, for example, 20 and 40, the thing becomes good enough so it can be used. But this very moment is not visible, since we do not use the -rcrtX suffix that eventually becomes -rtX.

Why OSADL's "Latest Stable"?
OSADL's "Latest Stable" simply was intended to make it visible when a PREEMPT_RT patch along with a given kernel version is suitable for use in all our test systems. Adding more and more systems to our QA farm for testing and requesting that the kernel must not only be suitable for use but for integration into an industrial product made it somewhat more difficult. But the principle still is the same, i.e. that we would like to mark the transition between the development period and the release period of a PREEMPT_RT patched kernel.

For the time being, crashes, regressions and other problems of the farm systems are collected at our farm but not sent out. I promise to send all of them on a regular basis to the new PREEMPT_RT Bugzilla tracker - everybody can then check out the open bugs and determine whether they affect a particular system or not. And OSADL's "Latest Stable" web page will be closely synchronized to this list. So we may have "particular stability" as opposed to "general stability". Integrators can then decide to use a PREEMPT_RT kernel version that apparently is stable on a particular system, while this version cannot be generally recommended. This hopefully will reduce the number of complaints that the OSADL's "Latest Stable" PREEMPT_RT kernel version is so much behind.