Brief items
The current stable 2.6 release is 2.6.16.6
2.6.16.7 2.6.16.8 2.6.16.9,
announced on April 19; it
contains a fix for an information leak vulnerability on some AMD
processors. Of the prior releases,
2.6.16.6 contains a fairly long
list of fixes, while
2.6.16.7
and
2.6.16.8 are single-patch
security fixes.
The current 2.6 prepatch is 2.6.17-rc2, announced by Linus on
April 18. There's a lot of fixes in this release, but it also
contains a simplified form of the scheduler starvation avoidance
patch, some tweaks to the memory overcommit algorithm, the removal of
the obsolete blkmtd and qlogicfc drivers, the removal of the unmaintained
Sangoma WAN drivers, the splice() and tee() system calls, and
pollable sysfs attributes.
See the
long-format changelog for the details.
For the record, it is worth noting that the prototypes for the
splice() methods in the file_operations structure have
changed again. This week's version:
ssize_t (*splice_write)(struct pipe_inode_info *pipe, struct file *out,
loff_t *offset, size_t len, unsigned int flags);
ssize_t (*splice_read)(struct file *in, loff_t *offset,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
The offset parameter, describing where in the stream I/O should
start, is new.
A few dozen patches (all fixes) have been merged into the mainline after
the -rc2 release.
The current -mm tree is 2.6.17-rc1-mm3. Recent changes
to -mm include an ACPI dock driver, i2c virtual adapter support, a number
of memory management tweaks, a trusted platform module (TPM) driver update,
and a new version of the zlib library.
Comments (none posted)
Kernel development news
I don't think anyone is smart enough to configure Apache with
SELinux. I've installed Apache maybe 20 times in my life, which is
plenty, and I eventually realized it was SELinux and just turned
the damn thing off after an hour of trying to fix it.
-- Dave Aitel
Keep in mind as well that SELinux "complexity" is purely a
reflection of complexity in Linux; SELinux just exposes the
existing interactions and provides a way to control them. The
SELinux mechanism itself is fairly simple.
-- Stephen Smalley
Comments (19 posted)
The developers interested in containers and virtualization have discussed
interfaces to virtualize access to a number of system resources. None,
however, have talked about virtualizing access to the system time. Until
now, that is. With Jeff Dike's
time
virtualization patches any process tree can
have its own idea of what time it is.
Jeff's patch adds a new "time namespace" structure to the task structure.
By default, all processes share the normal host system's idea of time. But
a new option (CLONE_TIME) to the unshare() system call
allows a process to disconnect from the system time. After such a call,
that process - and any children it creates - will be able to keep its own
time value. Setting a virtualized time value is, unlike changing the
normal system time, an unprivileged operation.
Internally, a virtualized time is stored as a simple offset; whenever a
process requests the current time, the offset is added to the the current
system time and the sum is returned. This approach has the advantages of
being simple and fast; a process running with virtualized time also does
not give up time adjustments made, for example, by NTP. On the other hand,
this implementation does not support the ability to confuse processes by
messing deeply with their idea of time - running time at a different rate,
for example, or even backward. Chances are that this omission will not
upset more than a small percentage of potential users of virtualized time,
however.
Jeff's purpose is to speed up the gettimeofday() system call in
User-mode Linux instances. If the kernel allows process subtrees to have
their own time values, then User-mode Linux can simply use the host's
gettimeofday() call, rather than intercepting that call and
implementing it itself. Since gettimeofday() is one of the most
frequently-used system calls, this optimization can make a significant
difference.
One other change is required, however, for User-mode Linux to get the
benefit from this change. UML performs much of its process control using
ptrace(); in particular, it intercepts and interprets system calls
with the PTRACE_SYSCALL operation. What is really needed for a
fast gettimeofday() is the ability to not intercept that
particular call. So Jeff's patch also extends ptrace() by adding
a PTRACE_SYSCALL_MASK operation. This new operation can set a
bitmask indicating which system calls should be intercepted, and which
should be executed without stopping.
The result, with a suitably patched UML, is a gettimeofday() call
which runs at about 99% of the native process speed. That may well be good
enough to make this patch a piece of the growing set of interfaces
supporting virtualization and containers.
Comments (4 posted)
Dan Bonachea recently
reported a problem.
It seems that he has a program where multiple threads are simultaneously
writing to the same file descriptor. Occasionally, some of that output
disappears - overwritten by other threads. Random loss of output data is
not generally considered to be a desirable sort of behavior, and,
says Dan, POSIX requires that
write()
calls be thread-safe. So he would like to see this behavior fixed.
Andrew Morton quickly pointed out the
source of this behavior. Consider how write() is currently
implemented:
asmlinkage ssize_t sys_write(unsigned int fd, const char __user *buf,
size_t count)
{
struct file *file;
ssize_t ret = -EBADF;
int fput_needed;
file = fget_light(fd, &fput_needed);
if (file) {
loff_t pos = file_pos_read(file);
ret = vfs_write(file, buf, count, &pos);
file_pos_write(file, pos);
fput_light(file, fput_needed);
}
return ret;
}
There is no locking around this function, so it is possible for two (or
more) threads performing simultaneous writes to obtain the same value for
pos. They will each then write their data to the same file
position, and the thread which writes last wins.
Putting some sort of lock (using the inode lock, perhaps) around the entire
function would solve the problem and make write() calls
thread-safe. The cost of this solution would be high, however: an extra
layer of locking when almost no application actually needs it. Serializing
write() operations in this way would also rule out simultaneous
writes to the same file - a capability which can be useful to some
applications.
So some developers have questioned whether this behavior should be fixed at
all. It is not something which causes problems for over 99.9% of applications,
and, for those which need to be able to perform this sort of simultaneous
write, there are other options available. These include user-space locking
or using the O_APPEND option. So, it is asked, why add
unnecessary overhead to the kernel?
Linus responds that it is a "quality of implementation" issue, and that if
there is a low-cost way of getting the system to behave the way users would
like, it might as well be done. His proposal is to apply a lock to the file
position in particular. His patch adds a f_pos_lock mutex to the
file structure and uses that lock to serialize uses of and changes
to the file position. This change will have the effect of serializing
calls to write(), while leaving other forms (asynchronous I/O,
pwrite()) unserialized.
The patch has not drawn a lot of comments, and it has not been merged as of
this writing. Its ultimate fate will probably depend on whether avoiding
races in this obscure case is truly seen to be worth the additional cost
imposed on all users.
Comments (none posted)
Back in 2001, the
very
first Linux kernel summit included a discussion on security policies.
At that meeting, it was decided that there was no interest in patching in
the several competing implementations which were available at that time.
Instead, developers interested in security were asked to create a generic
interface which could be used by any security policy. The result was the
Linux Security Modules (LSM) API - a long list of hooks which can be used
to intercept almost any operation of interest within the kernel.
Last year, some developers were heard to mumble that perhaps LSM should be
removed from the kernel. Since LSM was merged, there has been only one
serious security mechanism using it to emerge: SELinux. Since there is
only one LSM user, and since SELinux can be thought of as a fairly generic
security framework in its own right, it is not clear that there is a need
for the LSM interface. The discussion died down last year, however, and
there has been little talk of yanking out LSM.
Until now. In response to a current discussion on LSM hooks, James Morris
has posted a patch adding LSM
to the "feature removal" schedule. The end of LSM is not a distant event
either: the proposed date is this coming June - the 2.6.18 kernel, in other
words. If this patch goes through, LSM will be gone in the very near
future.
The early indications suggested that it could go through: several kernel
developers have argued in favor of the removal of LSM, while none
asked for it to be retained. The only disagreement - mild - was over the
removal date, with some arguing that 2.6.18 is too soon. Those in favor of
an early removal, however, claim that last year's discussion should count
as the usual one-year warning for this sort of change, and that there is no
need to wait any longer.
One might well wonder what the hurry is to remove this API from the
kernel. There is, in fact, more than just the "only one user" argument in
circulation. James's patch includes this text:
[LSM] also attracts a regular stream of misconceived and broken
security module submissions to mainline, such as BSD Security
Levels, and developers are seeing LSM as the answer to everything
rather than really thinking about what they need and how to
architect the code properly and generally.
So LSM becomes a general temptation to solve problems in the wrong way.
Beyond the security levels module (which, among other things, is seen as
having open vulnerabilities and no maintainer interest), the developers may
be thinking of past episodes like the debate over the realtime security
module or the Integrity
Measurement Architecture, neither of which is best implemented as a
security module.
The real issue, however, may be this one:
There is also a growing number of proprietary modules hooking into
LSM in unsafe ways, not necessarily even for security purposes. The
LSM interface semantics are too weak and such an API does not
belong in the mainline kernel.
The 2.6 kernel - intentionally - does not give loadable modules access to
the system call table. But the LSM interface is almost as good - it gives
a loadable module the opportunity to intercept almost any operation that
the kernel may attempt to perform. The LSM hooks are supposed to limit
themselves to internal record keeping and returning an allow/deny status to
the kernel - but there is no way to enforce that sort of restriction. The
GPL-only status of the LSM API does not help much either.
The people involved are wary of publicly pointing fingers at companies
suspected of misusing the LSM interface. One example which can be found,
however, is the kernel generalized event
management module which was posted to the kernel-mentors list last
year. When KGEM was loaded, it would shove aside any currently-loaded
security policy and install itself in its place. It would then feed
security-related events through to a (proprietary) user-space application,
which would make decisions aimed at protecting Linux users from the
pressing threat of virus attacks. There were a lot of issues over how this
module was implemented, but using LSM to override existing security
policies and provide hooks for proprietary code was considered especially
distasteful.
These reasons and strong developer pressure notwithstanding, it is not clear that
LSM will actually go away anytime soon. There is not yet a consensus that
SELinux should be seen as the One True Security Policy; many potential
users find its complexity hard to deal with and often simply turn it off.
The power of SELinux is unquestioned, but its usability is another story.
There are other users of the LSM API out there, they just have not been
submitted for inclusion into the mainline. These include:
- Novell's AppArmor, which is the security policy shipped with current
SUSE releases. AppArmor is free software, but has never been
submitted for review. The discussion of removing the LSM interface
appears to have lit a fire under some rear
ends at Novell, and
the first AppArmor submission is said to be imminent. (In fact, it
was posted just after this article was
published).
Some of the
early discussion, however, suggests that AppArmor could have a hard
path into the mainline. In particular, its use of file pathnames as
the core of its security policy has been strongly questioned. In a
system capable of hard and soft links, multiple namespaces, shared
subtrees, and more, the meaning of any specific pathname is far from
clear. That is why SELinux uses extended attributes to apply
labels directly to files, rather than relying on their pathnames.
- The Linux Intrusion Detection
System (LIDS) is an LSM user. The LIDS developers have asked that
LSM not be removed, but have not made any statements regarding if and
when they might submit their module for merging.
- The Dazuko module is used by tools
like ClamAV. Dazuko seems somewhat like KGEM, in that it exports an
interface for user-space programs to make decisions. It is not clear
that such an interface can ever make it through the review process.
- Multiadm is a
module which allows privileges to be handed out to non-root users.
Given that security is something other than a completely solved problem, it
would be surprising if there were any single approach which was suitable
for all users. So something may well emerge and qualify as the second user
which keeps the LSM API in place.
Or, at least, which keeps some sort of API in place. If LSM stays around,
the kernel developers will probably make changes which make the API harder
to abuse. These might include finding ways to restrict what LSM hooks can
do and providing compile-time options to wire in a single security policy
at kernel build time. So, while there is a reasonable chance that future
kernels will include an LSM interface, it might be a rather different
interface than the one there today. Any security module developers who
want to have a say in how the interface evolves would be well advised to
join the discussion soon.
Comments (15 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Janitorial
Memory management
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>