September 12, 2012
This article was contributed by Darren Hart
Thomas Gleixner (Linutronix) led the
2012 Linux Plumbers Realtime
Microconference in San Diego this year. This session went from 9:00 AM until
noon on Friday morning and continued the highly civilized tone prevalent across
the sessions of the various co-located conferences this year.
Thomas took a moment while opening the session to reflect on the passing of Dr.
Doug Niehaus and his contributions to real-time operating systems.
Paul E. McKenney (IBM) kicked things off with his presentation on "Getting RCU
Further Out of the Way" (reducing the overhead of RCU). Introducing no-callbacks
CPUs (no-CBs) allows the RCU callbacks as well as the grace period processing to
be offloaded to other CPUs. The callbacks are queued to new lists which require
atomic operations and memory barriers to allow for concurrent access (as they
are no longer created and executed on the same CPU). The prototype is limited by
requiring at least one CPU to handle RCU callbacks in order to wait for the grace
period and ensure the callbacks are run. It supports both polling mode as well
explicit wake-up by call_rcu(). Peter Zijlstra suggested offloading only the
callback processing by leaving the grace period processing on each CPU and
moving the callbacks to the atomic list. Paul acknowledged it to be a good
intermediate step, but also indicated that offloading the grace period
processing should not be overly difficult. Several people indicated interest in
the improvements.
Steven Rostedt (Red Hat) presented on the challenges of working PREEMPT_RT into
mainline in his presentation, "The Banes of Mainline for RT". He discussed
interrupt handling in the PREEMPT_RT kernel, which has periodically swung back
and forth between more and fewer threads in an attempt to balance lowest latency
with lowest overhead as well as maintainability and upstream acceptance.
Per-device interrupt handlers are considered ideal as they allow for finer
control of priorities for handlers on all-too-common shared interrupt
lines.
He
also spent some time discussing common livelock scenarios from mainline that the
PREEMPT_RT kernel has to work around. Firstly, the use of a nested trylock
defeats priority boosting. The solution is to drop the conflicting locks,
acquire the lock and immediately release it, then attempt the lock sequence
again. This approach ensures the priority boosting takes place and the possible
inversion becomes bounded. The practice of __do_softirq() raising its own
softirq in the event of a failed spin_trylock() can lead to a livelock in
PREEMPT_RT where ksoftirqd is run at realtime priority and all
softirqs are run
as preemptable threads. The solution here is to simply acquire the spinlock for
PREEMPT_RT, where the spinlock is converted into a mutex, allowing the lock
holder to be scheduled and complete its critical section. Steven threatened
getting rid of softirqs entirely.
Peter Zijlstra (Red Hat) briefly discussed the SCHED_DEADLINE scheduler,
including a request for would-be users to provide details of their use cases
which he can use to justify the inclusion of the code upstream. While he is in
favor of pushing the patches, apparently even Peter Zijlstra has to provide
convincing evidence before pushing yet another scheduling policy into the
mainline Linux kernel. It was reiterated that many media applications for Mac
OSX use the Mac EDF scheduler. Juri Lelli noted that Scuola Superiore Sant'Anna
has "toy" media players that use SCHED_DEADLINE. Contacting the JACK
community was suggested as well. As SCHED_DEADLINE requires the application to
specify its periodicity and duration, adoption may be slow. Fortunately, there
are straightforward methods of determining these parameters.
Frank Rowand (Sony) prepared some material on the usage of PREEMPT_RT with an
emphasis on the stable trees. He also presented some wishlist items collected
from users. While Thomas produces patch tarballs for the development PREEMPT_RT
tree, Steven currently releases stable PREEMPT_RT as a git branch. While
interest remains for the git branches, Steven has agreed to also release the
stable trees as patch tarballs (including previous releases). Some confusion was
noted regarding the development process for the stable trees, such as which
mailing lists to use, as well as the difference between Steven's stable branches
and the "OSADL Latest Stable" releases. It was agreed to add this information in
a README, and include that in-tree along with the PREEMPT_RT releases.
Some sort of issue tracker was requested. Darren Hart (Intel) and Clark Williams
(Red Hat) agreed to work with bugzilla.kernel.org to get one setup for the
PREEMPT_RT tree.
Frank continued to lead a lengthy discussion on the stable real-time release
process. The two areas of concern were the amount of testing these trees receive
and which versions to continue supporting (including the number of concurrent
trees). While Steven's stable trees are widely used by numerous organizations,
they do not receive much testing outside his machines before they are released.
It should be noted, however, that any patches to his stable trees must have
first spent some time in Thomas's development tree, and have therefore seen some
testing before they are pulled in.
Carsten Emde's (OSADL) long-term load
systems, on the other hand, perform sequential long-running cyclictest runs, one
of which recently completed one year of uptime without violating real-time
constraints in 160 billion cycles. Carsten, who was not present, defines the
OSADL Latest Stable criteria as: "all our development systems in the QA Farm
must be running this kernel for at least a month under all appropriate load
scenarios without any problem." Thomas has agreed to add Steven's trees to his
automated testing to help improve the level of testing of the stable releases.
As for the longevity of the long-term stable releases (3.0, 3.2, 3.4, ...)
Steven drops support for a stable release when Greg Kroah-Hartman does. Ben
Hutchings's
stable tree appears to be the sole exception. Steven will continue to support
one Hutchings stable tree at a time, so long as time permits.
Darren brought up longer term support and alignment with the Long-Term Support
Initiative as the Yocto
Project supports both PREEMPT_RT as well as LTSI, and alignment here
significantly reduces the support effort. If an LTSI PREEMPT_RT tree is to be
maintained, someone will need to volunteer for the task. Darren indicated the
Yocto Project is likely to do so.
Following Frank, Luis Claudio Goncalves (Red Hat) discussed the joys of working
with customers on real-world use-cases. Customers often push the boundaries of
what has been tested by running on much larger systems than you might expect.
They also frequently run into performance issues or livelocks with programming
mechanisms that more or less work in mainline, but definitely do not when
running with real-time priority. CPU-bound threads, including busy loops to
avoid latency via polling, can starve the system when run as a high-priority
realtime threads, resulting in large latency spikes or system freezes. Running
a single process with 1000 threads leads to heavy contention on
mmap_sem; large
changes would be required for PREEMPT_RT to deal with this scenario well.
The
concept of CPU isolation has its own pitfalls. An "isolated" CPU still runs
several kernel threads and requires housekeeping, while users may expect the CPU
to run only the isolated application. In these scenarios, the application is
commonly set to SCHED_FIFO at priority 99, resulting in severe latencies and
system freezes as the lower priority kernel threads are prevented from running. A
list of the current work that must be performed by the kernel on an isolated CPU
is documented in Gilad
Ben-Yossef's Linux wiki. Some of the issues listed
there have been fixed or at least minimized. Paul's presentation addressed two
of them. Additionally, Luis has volunteered to work on some best practices
documentation, but has asked for people to help review.
In closing, Thomas noted that there would not be a 3.5-rt release, but that he
would be looking at 3.6 for the next PREEMPT_RT release. The further refinement
of IRQ handling was mentioned as one of the most noteworthy changes planned for
the 3.6-rt release.
Thanks to all the presenters, participants, and reviewers, as well as Paul E.
McKenney, whose notes helped to show that two note-takers are better than one.
(
Log in to post comments)