Brief items
The current 2.6 prepatch is 2.6.11-rc3,
released by Linus on February 2. This
prepatch adds an XFS update, a set of out-of-memory killer fixes, a generic
transport class mechanism (which replaces the SCSI transport code), some
architecture updates, the removal of
bcopy(), a fix for writable
module parameters in sysfs (it never actually worked before), and various
fixes. See
the long-format changelog for
the details.
Linus's BitKeeper repository contains a small number of patches, including
some IDE updates, some additional checking in read() and
write() (see below), a DMA blacklist for problematic serial ATA
drives, and a handful of fixes.
The current -mm tree is 2.6.11-rc3-mm1.
Recent changes to -mm include a firewire update, the address space
randomization patches (covered on last week's security page), the
"BIO pool" mechanism, the removal of the realtime rlimit patch (see below),
and more fixes.
There still have been no 2.4.30 prepatches.
Comments (none posted)
Kernel development news
A large part of kernel history is currently practically locked into
bk. bk isn't doing what I need, so naturally I'm looking for
alternatives, but I don't have the freedom to take my data and try
it with some other tool. Was that really part of the deal when bk
was introduced that I'm denied of this freedom?
--
Roman Zippel
It's exactly the same as a file system. If you put some files into
a file system does the file system creator owe you the knowledge of
how those files are maintained in the file system? Since when is
that part of the deal?
--
Larry McVoy
If you must follow this conversation, the thread can be
found over
here.
Comments (11 posted)
Two weeks ago, it appeared
that a solution to the problem of low-latency scheduling for audio
applications had been found. Ingo Molnar's approach, which allowed
unprivileged processes to use the realtime scheduling modes as long as they
did not use more than an administrator-specified portion of the available
CPU time, seemed like a reasonably straightforward way to go. Ingo's patch
had gone into the -mm tree for further testing.
The rlimit approach keeps a rogue process from taking over the system
entirely. It does not, however, prevent abuse by poorly-behaved software.
If even limited access to realtime scheduling became widely available on
Linux systems, it would only be a matter of time until developers figured
out that they could make their programs seem faster by using the realtime
mode. Proprietary applications could be particularly problematic in this
regard; distributors would likely rip out unwarranted realtime scheduling
calls in free software that they ship, but that cannot be done with
proprietary code.
Other concerns with the rlimit approach include the need for some audio
applications to get fast access to the CPU even if they require 100% of the
available time, and general unease with tweaking the scheduler for this
use. The end result is that the rlimit patch has come back out of -mm, and
Ingo has said:
i'm not opposed to the LSM solution per se, especially given that
none of the other solutions in existence are fully satisfactory
(and thus acceptable for the scheduler currently). The LSM patch is
clearly the least intrusive solution.
Those who have been following the discussion will remember that the whole
long thing began because certain kernel developers did not feel that the
realtime security module (which gives members of an administrator-specified
group access to realtime scheduling) was acceptable for inclusion. So the
discussion has come back to where it started, and it appears that the
realtime security module will be merged (though that had not happened as of
this writing). Ingo apologized for the
whole thing, explaining it this way:
it is just an unfortunate situation that the issue here is _not_
clear-cut at all. It is a longstanding habit on lkml to try to
solve things as cleanly and generally as possible, but there are
occasional cases where this is just not possible.
One remaining problem with the realtime security module is that it gives
audio users the right to monopolize the processor with any program they
run, not just audio utilities. Making the audio programs run in a setgid
mode might seem like a way around that issue, except for the fact that the
GTK+ toolkit actively prevents things from
working that way. The unfortunate result is that users must be given more
privilege than they actually need. Most of the time, that should be
acceptable; multi-user audio workstations are likely to be relatively
rare.
Comments (12 posted)
Long ago, when the 2.0 kernel was the state of the art, the implementation
of the
read() and
write() system calls (and
readv() and
writev() too) behaved a little differently
than now. Then,
as now, the main purpose of the core implementation of those system calls
was to pass the call on to the appropriate function in the filesystem code
or device driver handling the file of interest after dealing with any
relevant file locking details. In many ways,
sys_read() and
friends in 2.6 look very much like they did in 2.0.
The 2.0 implementation differed, however, in that it checked whether the
calling process had the ability to read or write the buffer it passed into
the kernel. The semantics of a read() call, say, should be the
same regardless of where the data is being read from. So it made sense to
check, before invoking the VFS or device driver, that the buffer passed to
read() was writable by the calling process.
In 2.2, that check went away, possibly as part of the big changes made to
how user-space access checks were implemented. Performing those checks
became entirely the responsibility of the lower-level code.
Linus recently merged a patch which restores the
upper-level checks for 2.6.11. The reason given with the patch is that
checks performed in lower-level code only verify the range of memory which
will actually be read from or written to. If that range is smaller than
the application requested (because the file is not that long, say), part of
the range requested by the application will not be checked. The operation
of the system is entirely correct in this case, but an opportunity to flag
a bug in the calling program will have been missed.
It also doesn't hurt that placing the check at the entry point to the
kernel ensures that it will be done in all situations. One less
opportunity for security problems resulting from forgotten checks in
lower-level code can only be a good thing. It seems almost certain that at
least one such vulnerability must exist somewhere in the 2.6 kernel.
One might conclude that low-level code, such as device drivers, need no
longer perform the access_ok() check, since it is now being
handled at a higher level. A prudent developer, however, would probably
leave that check in place. It is quite cheap on most architectures (it
generally just ensures that the given buffer is not located in kernel
space), and the higher-level checks went away once before. Safe is better
than sorry, especially when being safe is so easy.
(For completeness, it's worth noting that Linus merged another patch which ensures that a read or
write operation does not overflow the file offset).
Comments (none posted)
The kernel has, for a while now, been accumulating hooks for informing user
space when things happen. Some of the current mechanisms include:
- The hotplug mechanism, which invokes a user-space program
(/sbin/hotplug by default) when kobjects are registered or
unregistered (generally in response to the addition or removal of
hardware on the system).
- The Linux security module (LSM) hooks, which enable a loadable module
to respond to (and possibly veto) dozens of actions by user-space
processes. The LSM mechanism is used by, among other things, SELinux
and the realtime LSM module.
- The lightweight audit framework uses a
netlink socket to pass information on kernel events to user space,
with the idea that these events will be logged somewhere.
- The kernel events mechanism, which
also uses netlink, is a simple scheme for notifying user space of
events which might be of interest to the user(s).
One might think that, at this point, the kernel is sufficiently well
instrumented that more hooks would be unnecessary. But more are on the
way.
One of those is the relay fork module,
proposed by Guillaume Thouvenin. Its sole purpose is to inform interested
user-space processes when a process forks; the intended user is the enhanced Linux system accounting
project. Rather than use one of the existing mechanisms for conveying
information to user space, the relay fork patch works by sending a signal
to the interested process(es) whenever a fork occurs.
The patch works by adding a new sysfs directory (/sys/relayfork)
with a couple of control attributes. The attribute signal
controls which signal is sent; by default, signal 33 (which is in the
realtime signal range on most architectures) is used. The other attribute
(processes) contains a list of the processes receiving these
signals. Registering a process for receipt of "relay fork" signals is
simply a matter of writing its process ID to the processes
attribute.
This patch may eventually go in, but probably not with the signal
mechanism. Guillaume was encouraged to use the kernel events mechanism
instead, and he has agreed that it is a workable solution.
Meanwhile, the vSecurity project is working
to put together a number of hardening technologies in a form suitable for
merging into the mainline. To that end, a couple of new LSM hooks have
been proposed. This one adds a hook for
invocations of the chroot() call, which, interestingly, has no
such hook now. The purpose is not so much to control the use of
chroot() as to note that it has happened and take steps, in other
security hooks, to ensure that the process does not break out of its
restricted subtree.
The other patch adds a hook to
chmod(). This one is unlikely to be merged, since a separate
hook, which is called for inode attribute changes, already exists. The
vSecurity hacker (Lorenzo Hernández García-Hierro) has indicated that he
has other hooks he wishes to place, but those have not yet been posted for
review.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>