The current 2.6 kernel is 2.6.5
, which was announced
by Linus on April 3. Changes
since -rc3 include another ALSA update, some architecture updates, and
Linus's BitKeeper repository has no new patches; he is off the net for the
week. In its place, Andrew Morton has put together a "merge candidate"
tree, the current release of which is 2.6.5-mc2. This tree contains the laptop mode
patches, a set of ReiserFS updates, IPv6 support for SELinux, the
lightweight auditing framework (see below), the POSIX message queues patch,
the fcntl() file_operations method (covered here last month), some virtual memory improvements,
non-exec stack support, various architecture updates, and lots of fixes -
207 patches in all.
The current -mm tree is 2.6.5-mm2; recent
additions to -mm include some software suspend fixes, an autofs4 update,
and more fixes. The 4G/4G virtual memory patch has been dropped for now;
it was suspected of causing some problems, and it gets in the way of the
other virtual memory work being done.
The current 2.4 prepatch is 2.4.26-rc2, which was released by Marcelo on April 5. This
patch adds a relatively small number of fixes, including adds some IDE
updates, and an XFS update.
Comments (1 posted)
Kernel development news
A recent posting
linux-kernel announced the creation of a new mailing list, hosted at OSDL,
for the discussion of device naming schemes. The Linux Standard Base does
not currently specify device names, but its maintainers would like to
change that. To that end, they are seeking input on how devices should be
named on Linux systems.
The discussion, so far, has centered around a proposal (available in
PDF format) from SUSE. Its purpose is to create a set of persistent
device names which will remain valid even in a hotpluggable world where the
hardware configuration can change at any time. To that end, the proposal
creates a version of /dev which is radically different from
anything seen on current Linux systems.
All of the current device names found in /dev are relegated to the
category of "compatibility names." They will still exist, but the proposal
suggests that they should be maintained by udev, rather than being
a static part of the system. The new names, instead, will all be found in
subdirectories under /dev. Disks will be in /dev/disk
(with a "k"), and the obvious things will be found in other
directories, such as /dev/printer, /dev/cdrom (these,
evidently, are not "disks"), or /dev/modem.
The proposal calls for another level of subdirectories before you find any
actual device names. Each of the /dev subdirectories would be
further divided into by-path, which names each device by how it is
connected to the system; by-serial, which uses the device's model
name and serial number; by-uuid, which uses a device's "universal
unique identifier"; and by-label, which uses a device's filesystem
label. Thus, a system's root partition might have all of the
The use of multiple names for the same device does not sit well with
everybody; fears have been expressed that it could confuse users and
applications which perform user-space locking by device name. The
by-path names were received critically; since the path can change
on a modern system, those names will never be persistent. There were also
complaints about by-label and by-uuid; those names are
meant to allow Linux systems to find and mount disks regardless of their
position in the device hierarchy, but the mount utility already
implements that functionality.
While there have been complaints about the SUSE proposal, there have not,
thus far, been a lot of alternatives put forward. Something, however, is
clearly going to have to change. A Fedora Core 2 Test 2 system has almost
19,000 entries under /dev; this mass of names can only get larger
and increasingly unmaintainable. And it fails to address the dynamic
nature of devices in modern systems. Device naming looks to be an
interesting issue for some time to come.
Comments (6 posted)
The kernel capability mechanism gives (relatively) fine-grained control
over what actions any given process can perform. The various capabilities
include the ability to override file permissions, send signals to other
processes, bind to low-numbered ports, and many other tasks. There have
been visions over the years of exporting capabilities to user space and
eliminating the "all-powerful superuser" concept, but none of those visions
have been implemented in any sort of widely-distributed sort of way.
One of the capabilities is called CAP_IPC_LOCK; it gives a process
the ability to lock a region of virtual memory into physical RAM. This
capability needs to be controlled; otherwise a rogue process could lock up
all of physical memory and effectively shut down the system. There are,
however, legitimate reasons for giving this capability to normal users.
Programs which handle encryption (such as gpg) would like to lock in some
of their memory so that passphrases and clear text do not get written out
to swap. Systems like Oracle need the capability to lock in their shared
segments (since they do their own paging, essentially) and to be able to
allocate large page "hugetlb" segments.
To this end, Andrea Arcangeli posted a patch
which allows the system administrator to disable CAP_IPC_LOCK
checking via a sysctl variable. With those checks disabled, any
non-privileged process can lock pages into memory or allocate large-page
shared memory segments. Andrea asked for the patch to be incorporated into
the 2.6 mainline.
The patch inspired some thinking on how best to make certain capabilities
available to users. There has been a
patch in circulation for a while which simply opens up memory locking
to everybody, but which puts a resource limit on the number of pages which
can be locked. The default limit is a single page, which works for gpg but
which does not easily threaten the system as a whole. With a suitably
adjusted limit, this patch should work for Oracle as well - but it does not
address the large-page shared memory issue.
William Lee Irwin put together a different
patch which allows the administrator to turn off checks for any
capability via a set of sysctl variables. It differs from Andrea's patch
in its generality, but also by virtue of using the security module
framework rather than direct changes to the kernel core. Some people
seemed to like this patch better, though there was some nervousness about
its overall security which led William to add a
strong comment and a lockdown capability
to the patch.
Given that the whole idea behind capabilities was to be able to give
specific capabilities to individual users, however, some developers
wondered why the current system couldn't be used. To this end, Andrew
Morton looked into hacking login to
enable it to give capabilities to users. He was not impressed with what he
found once he started trying to work with kernel capabilities:
It turns out that the whole "drop capabilities and then run
something" thing does not work in either 2.4 or 2.6. And hasn't
done since forever. What we have in there is no more useful than
I must say that I'm fairly disappointed that we developed and
merged all that fancy security stuff but nobody ever bothered to
fix up the existing simple capability code.
Particularly as, apparently, the new security stuff STILL cannot
solve the extremely simple Oracle-wants-CAP_IPC_LOCK requirement.
It was pointed out that SELinux can, in
fact, solve this problem. But that will be little comfort to those who are
not yet ready to adopt SELinux for their production systems.
The problem may originate from the fact that the visions of fully
capability-driven systems involve assigning capabilities to all executables
and having a process's capabilities tweaked every time a new program is
run. That part of the system has never been merged into the mainline,
partly because nobody has ever really figured out how to deal with system
administration when every file has another 32 permissions bits added onto
it. The end result, in any case, is that the capability subsystem has
never worked quite as it should. Given that Andrew is the gatekeeper,
chances are good that some sort of fix for that problem will get into the
kernel before any sort of more complicated solution to the problem of
giving capabilities to users.
Comments (5 posted)
One of the patches in Andrew Morton's "merge candidate" tree is the
lightweight audit framework. This patch, written by Rik Faith, is intended
to be a way for the kernel to get various types of audit information out to
user space without slowing things down, especially when auditing is not
being used. The framework is meant to serve as a complement to SELinux; it
is already being shipped as a part of the Fedora Core 2 test 2
There are two kernel-side components to the audit code. The first is a
generic mechanism for creating audit records and communicating with user
space. All of that communication is performed via netlink sockets; there
are no new system calls added as part of the audit framework. Essentially,
a user-space process creates a NETLINK_AUDIT socket, writes
audit_request structures it, and reads back audit_reply
structures in return.
The generic part of the audit mechanism can control whether auditing is
enabled at all, perform rate limiting of messages, and handle a few other
tasks. On the kernel side, it provides a printk()-like mechanism
for sending messages to user space. This code also implements a
user-specified policy on what happens if memory is not available for
auditing; truly paranoid administrators can request that the kernel panic
in such situations.
The audit patch includes some SELinux tweaks to make it use the audit
functions rather than printk() when it has something to log.
The audit logging code expects an audit daemon to be running to accept
messages via the netlink socket. Code for an example daemon is available
in Rik's Red Hat web
area. Should there be no daemon running, log messages are simply
passed to printk() instead.
In addition to the generic support code, the audit patch includes a
mechanism for auditing system calls. One gets the sense that this was the
real purpose for the patch. System call auditing is off by default, but a
suitably privileged user-space process can turn it on and load a whole set
of rules describing what should be logged. Rules can test on various
attributes of the calling process, including its process ID, user and group
ID (both "real" and "effective"), etc. Rules can also be set to fire on
accesses to particular devices or files. Finally, there are also tests on
specific system call arguments, whether the call succeeds, or for a
specific return value.
Included with the audit daemon is an auditctl utility which can be
used for setting and tweaking rules.
The audit mechanism will give system administrators a new tool for looking
at what is going on between user space and the kernel. With the addition
of some user-space utilities, it could become a powerful facility for
tracking down system problems and security issues - or for any number of
big-brotherish applications. Expect to see it in 2.6.6.
Comments (6 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>