By Jake Edge
October 14, 2009
There are far too many interesting Linux and free software conferences
these days, so it would be difficult—really, impossible—to
attend them all. Slides and videos of the talks can help fill in the gaps,
but, for conferences with a more academic bent, the papers that are the
basis of the presentations can give an even more detailed look. The papers
from the recently concluded Real
Time Linux Workshop are a good example; this article will briefly look
at a few of them.
Myths and Realities of Real-Time Linux Software Systems
This paper
[PDF] can serve as an introduction to realtime for those who
are not familiar with what that means. Author Kushal Koolwal starts with
the basics:
defining realtime, describing various kinds of latencies, and looking at
hard vs. soft realtime, before moving into a few myths. Koolwal then looks
at realtime in Linux, focusing on the PREEMPT_RT patchset. In a
few short pages, this paper will give the reader a good foundation in
realtime and the trade-offs necessary to support it.
Finding origins of latencies using Ftrace
Ftrace developer Steven Rostedt describes how to use ftrace to find
unexpected and/or unacceptable latencies, which may be a barrier to
realtime processing, in his paper
[PDF]. Ftrace is a relatively new tool in the kernel that provides
various kinds of tracing information and has some facilities that can be
used specifically for tracking down latency issues. Tracers like
irqsoff, preemptoff, and wakeup (along with some
variants) capture information while the kernel is running in specific modes
(i.e. with interrupts disabled, preemption turned off, etc.).
Rostedt's paper gives a fairly detailed look at the tracers, how to
enable them, what they do, and the output they produce. While these
latency tracers are active, they capture things like kernel functions
called or trace event points encountered while looking for the maximum time
spent in the latency-causing modes. By looking at what the kernel is doing
when the latency has exceeded expectations, it can lead a developer to the
specific cause—which may lead to a way to reduce the latency.
Rostedt mentions the JACK "audio connection kit" developers as an early
adopter of latency tracing, noting that they found both kernel and JACK
bugs that were causing excess latency.
Towards Linux as a Real-Time Hypervisor
Jan Kiszka reported
[PDF] on experiments using Linux as a hypervisor for realtime
processing. Using KVM and QEMU, he measured the latency in both the host
and guest operating systems under a number of different scenarios. One of
the more obvious means to increase the responsiveness of the guest is to
raise the priority of the QEMU threads and to put them into a realtime
scheduling class. But that can lead to starving host OS processes that the
guest is waiting on, which could lead to deadlock or other undesirable
behavior.
The paper reports on the measurements of average and maximum latency, as
part of a latency histogram, under
different conditions: a baseline test in the host as well as in the guest,
applying the priority and scheduling class changes to the guest, lowering
the priority on the asynchronous I/O (AIO) QEMU threads, and using
PREEMPT_RT kernel on the host. In addition, Kiszka describes a
"paravirtualized scheduling" approach that allows the guest to send the
host information on spinlock usage that will allow the host scheduler to
adjust priorities of the guest processes for more efficient use of the
CPUs, while avoiding priority inversions
ARM Fast Context Switch Extension for Linux
The organization of the ARMv5 cache can cause performance problems that may
preclude its use for realtime tasks. The cache is based on virtual memory
addresses
and, since Linux processes share the same range of virtual addresses, each
context switch requires invalidating the cache. Depending on the CPU type,
memory speed, and the program's data access pattern, the cost of reloading a
process's data from main memory can be on the order of 200
microseconds—too much for many time-critical applications.
One alternative is to share a flat address space between all of the
processes, but then the memory protection provided by separate address spaces
is lost. Gilles Chanteperdrix and Richard Cochran describe
[PDF] another approach for doing context switches that preserves
the memory protections without sacrificing the cache at every context switch.
They use the ARM Fast Context Switch Extension (FCSE) and
partition the virtual address space into separate
32MB chunks so that processes do not have overlapping address ranges. This
allows for up to 128 processes running in the 3GB available for non-kernel
addresses. The translation lookaside buffer (TLB) must still be flushed on
context switches to enforce memory protection, but the data and instruction
caches are preserved.
The actual implementation required reducing the number of available
processes to 95. Either 95 or 128 processes, along with the 32MB address
space restriction, were unacceptable for many embedded applications, so the
authors
added a "best effort mode" that eliminates those restrictions, but cannot
guarantee that it won't have to do cache flushes on some context switches.
They reported that average latencies for their test cases reduced by
roughly half
when the "guaranteed" mode was used, and by roughly one-quarter with "best
effort" mode, when compared to the standard Linux kernel.
Design and Implementation of Node Order Protocol
Distributed systems often use "time division multiple access"
(TDMA) as a means to coordinate access to a shared communications medium
(e.g. shared bus or wireless frequencies). But, TDMA requires a reliable
means to synchronize the clocks on the various systems and that
synchronization uses some of the shared bandwidth simply for timekeeping.
The authors, Li Chanjuan, Nicholas McGuire, and Zhou Qingguo, propose
[PDF] a different protocol, Node Ordering Protocol (NOP), that avoids
much of the complexity and bandwidth waste that occur with TDMA.
As its name implies, NOP relies on a consistent ordering of the nodes in
the network. It also requires that nodes monitor the other nodes to
determine if a faulty node is not correctly following the ordering scheme.
The advantages, according to the authors, are that NOP is much easier to
implement and validate than other protocols with complex synchronization
requirements, loss of bandwidth due to temporal padding is not required,
and that error detection is much simpler and bounded in time.
Use of cookies in real-time system development
One last paper to mention is the scholarly-sounding, if tongue-in-cheek,
look at
cookie consumption and "
the positive impact on the real-time Linux
community we were able to observe". The authors, M. Gleixner and
M. McGuire, look at various cookie protocols—with code—and
conclude that uni-directional protocols are best for real-time Linux
development: "
Though greedy protocols have been discussed in the
past, we found that considering these has negative impacts on developers
long term and thus are deprecated."
The slides for some of the presentations are available
on the Open Source Automation Development Lab (OSADL) web site. There are
quite a few more papers than we looked at here available as well. While
the papers can't really replace the experience of attending, there is much
of interest for those that are looking for more information on realtime in
Linux.
(
Log in to post comments)