Brief items
The current stable 2.6 kernel is 2.6.11.10,
released on May 16 in response to yet
another serious security hole.
The current 2.6 prepatch remains 2.6.12-rc4. Linus has returned
from his vacation and has merged about 150 patches into his git repository;
these patches consist almost exclusively of security fixes, architecture
updates, and various other important fixes.
The current -mm tree is 2.6.12-rc4-mm2. Recent additions
to -mm include the IPSec tree, some KProbes work, the fork connector patch
(for process accounting), a DVB update, an ALSA update, a NUMA-aware slab
allocator, and more fixes. Note that there is now a mailing list for
people who would like to be notified when patches are added to -mm; see the
2.6.12-rc4-mm2 introduction
for subscription information.
The current 2.4 prepatch is 2.4.31-pre2, which was released by Marcelo on May 12. It
contains a fix for the ELF core dump vulnerability and a small number of
other patches.
Comments (none posted)
Kernel development news
Hyperthreading (or symmetric multi-threading) is a hardware technique used
to squeeze more performance out of modern processors. A hyperthreaded
processor appears, in many ways, to be a set of two independent
processors. These two processors share the same hardware, however,
with only the processor registers and other state-dependent information
being kept separate. Only one of the two CPUs can actually be executing at one
time. Hyperthreading helps performance because processors often stall,
waiting for memory accesses. When one processor in a hyperthreaded set
must wait, the other can be executing. Hyperthreading thus enables greater
utilization of the processor hardware; the resulting performance gains are
said to be anywhere from 5% to 30%, depending on the workload.
One of the resources shared by hyperthreaded processor sets is the memory
cache. This sharing has its advantages: if processes running on the two
processors are sharing memory, that memory need only be fetched into the
cache once. That kind of sharing happens often; shared libraries are one
obvious example. The shared cache also makes moving processes between
hyperthreaded processors an inexpensive operation, so keeping loads
balanced across the system is easier.
The sharing of caches between hyperthreaded processors is also, however,
the cause of a vulnerability identified in a heavily trailered report
by Colin Percival. The core of the problem is that, by measuring the
latency of specific memory accesses, a process can tell whether a given
memory location was represented in the processor cache or not. A hostile
process can load the cache with its own memory, wait a bit, then run tests
to see which locations have been evicted from the cache. From that
information, it can make inferences about which memory locations were
accessed by the sibling processor in the hyperthreaded set.
Two cooperating processes, running at different privilege levels, could
make use of the cache to set up a covert channel for communication. In a
highly secured system, these two processes might not be able to talk to
each other at all normally. With a covert channel in place, information
can be leaked from a privileged level to one less privileged, leading to
all kinds of dreadful consequences - for somebody. Most systems, however,
are not overly concerned about this sort of covert channel; there are
easier ways to deliberately leak information.
Mr. Percival, however, also shows how the vulnerability can be exploited to
obtain information from processes which are not cooperating. In
particular, he claims that it can be used to steal keys from cryptographic
applications. A number of crypto algorithms have data-dependent memory
access patterns; an attacker who can watch memory accesses can, for some
algorithms, derive the key which was being used. The exploit discussed in
the report attacks the OpenSSL key signing algorithm in this way.
The paper makes a number of recommendations on steps which can be taken to
mitigate this problem. The simplest is to simply disable hyperthreading;
on Linux systems, it is a simple matter of configuring out hyperthreading
support or booting with the noht option. Alternatively, the
kernel could take care not to schedule potentially unfriendly processes on
the same hyperthreaded set. Removing access to a high-resolution clock
would make the necessary timing information unavailable, thus defeating
such attacks. Cryptographic algorithms could be rewritten to avoid
data-dependent memory access patterns. Processors could be redesigned
to not share caches between hyperthreaded siblings, or to use a cache
eviction algorithm which makes it harder to determine which cache lines
have been removed.
The Linux scheduler could certainly be changed to defeat attempted cache-based
attacks on hyperthreaded processors, but the chances of that happening are
small. There are numerous obstacles to any sort of real-world exploit of
this vulnerability. The attacker must be able to run a CPU-intensive program on
the target system - without being noticed - and ensure that it remains on
the same hyperthreaded processor as the cryptographic process. The data
channel is noisy at best, and it will be made much more so by any other
processes running on the system. Timing the attack (knowing when the
target process is performing cryptographic calculations, rather than doing
something else) is tricky. Getting past all these roadblocks is
likely to keep a would-be key thief busy for some time.
In other words, there are almost certainly more effective ways of attacking
cryptographic applications. Closing this particular hole is unlikely to be
worth the trouble, extra complexity in the kernel, and performance impact
it would require. So this vulnerability, despite all the press it has
obtained, will probably not lead to any changes to the kernel in the near
future. Anybody who is truly worried about this problem will be best off
simply turning off hyperthreading for now. In the longer term, authors of
cryptographic code may find that they need to add avoidance of
data-dependent memory access patterns to their arsenal of techniques.
Comments (12 posted)
John Stultz's new core time subsystem was covered on this page
back in January. This patch set, which will
be submitted soon for inclusion (into -mm), replaces a mess of
architecture-specific time implementations with a cleaner, central time
subsystem which can take full advantage of hardware time sources. Nishanth
Aravamudan would now like to take advantage of the new low-level time code
by replacing the kernel timer implementation. This work, if accepted, will
lead to the incorporation of a new timer API to be used by kernel code when
a function must be called at some point in the future.
In current Linux kernels, internal time (for most purposes) is measured in
"jiffies," which are really just a counter which is incremented
when each timer interrupt happens. The new time code supersedes
jiffies with an absolute, monotonically increasing count of
nanoseconds. References to jiffies thus become a call to:
nsec_t do_monotonic_clock(void);
Using nanoseconds allows kernel code to work with high-resolution time in
real-world units. That, in turn, lets kernel developers forget about the
(error-prone) conversions between jiffies and real-world time
which are currently necessary.
Nishanth's add-on patch changes the timer subsystem to use nanoseconds as
well. The current add_timer() and mod_timer() interfaces
remain supported, but are deprecated. The new interface for setting (or
modifying) a timer is:
int set_timer_nsecs(struct timer_list *timer, nsec_t expires);
void set_timer_on_nsecs(struct timer_list *timer, nsec_t expires,
int cpu);
This function will cause the given timer to be set to go off at
expires, which is an absolute nanoseconds count. Usually,
expires will be calculated by adding the desired delay (in
nanoseconds) to whatever do_monotonic_clock() returns.
It's worth noting that this patch changes the meaning of the
expires field in the timer_list structure. This field is
now represented in an internal "timer intervals" unit, rather than in
jiffies. If the old add_timer() and mod_timer()
interfaces are used, the expires field will be silently converted
to the internal format. Code which performs calculations on
expires (by increasing the delay and calling mod_timer(),
for example) could be in for a surprise.
This patch also deprecates schedule_timeout(), in favor of these
functions:
nsec_t schedule_timeout_nsecs(nsec_t timeout);
unsigned long schedule_timeout_usecs(unsigned long usecs);
unsigned int schedule_timeout_msecs(unsigned int msecs);
All three of these functions will set a timer for the given delay (which is
a relative value, not absolute), then call schedule().
Comments (14 posted)
The creation of tightly-connected clusters requires a great deal of
supporting infrastructure. One of the necessary pieces is a lock manager -
a system which can arbitrate access to resources which are shared across
the cluster. The lock manager provides functions similar to those found in
the locking calls on a single-user system - it can give a process read-only
or write access to parts of files. The lock management task is complicated
by the cluster environment, though; a lock manager must operate correctly
regardless of network latencies, cope with the addition and removal of
nodes, recover from the failure of nodes which hold locks, etc. It is a
non-trivial problem, and Linux does not currently have a working,
distributed lock manager in the mainline kernel.
David Teigland (of Red Hat) recently posted
a set of distributed lock manager patches (called "dlm"), with a request for inclusion
into the mainline. This code, which was originally developed at Sistina,
is said to be influenced primarily by the venerable VMS lock manager. An
initial look at the code confirms this statement: callbacks are called
"ASTs" (asynchronous system traps, in VMS-speak), and the core locking call
is an eleven-parameter monster:
int dlm_lock(dlm_lockspace_t *lockspace,
int mode,
struct dlm_lksb *lksb,
uint32_t flags,
void *name,
unsigned int namelen,
uint32_t parent_lkid,
void (*lockast) (void *astarg),
void *astarg,
void (*bast) (void *astarg, int mode),
struct dlm_range *range);
Most of the discussion has not been concerned with the technical issues,
however. There are some disagreements over issues like how nodes should be
identified, but most of the developers who are interested in this area seem
to think that this implementation is at least a reasonable starting point.
The harder issue is figuring out just how a general infrastructure for
cluster support can be created for the Linux kernel. At least two other
projects have their own distributed lock managers and are likely to want to
be a part of this discussion; an Oracle developer
recently described the posting of dlm as "a
preemptive strike." Lock management is a function needed by most
tightly-coupled clustering and clustered filesystem projects; wouldn't it
be nice if they could all use the same implementation?
The fact is that the clustering community still needs to work these issues
out; Andrew Morton doesn't want to have to make
these decisions for them:
Not only do I not know whether this stuff should be merged: I don't
even know how to find that out. Unless I'm prepared to become a
full-on cluster/dlm person, which isn't looking likely.
The usual fallback is to identify all the stakeholders and get them
to say "yes Andrew, this code is cool and we can use it", but I
don't think the clustering teams have sufficent act-togetherness to
be able to do that.
Clustering will be discussed at the kernel summit in July. A month prior
to that, there will also be a
clustering workshop held in Germany.
In the hopes that these two events will help bring some clarity to this
issue, Andrew has said that he will hold off on any decisions for now.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Janitorial
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>