Brief items
The current 2.6 development kernel is 2.6.27-rc2,
released on August 5.
There's a lot of changes here, many of which are fixes or include file
reorganizations (architecture-specific include files are moving from
include/asm-xxx to
arch/xxx/include), but there's also a
driver for the SGI "GRU" system management device, support for the MIPS
architecture in the common kgdb debugger, a new subsystem for the
management of voltage and current regulators, some core memory management
and VFS locking changes, a driver for the SPI master controller on Orion
chips, and the removal of the long-deprecated
cli() and
sti() functions. See
the short-form
changelog for details, or
the
full changelog for lots of details.
As of this writing, no changesets have been merged into the mainline
repository since the 2.6.27-rc2 release.
The current -mm tree is 2.6.27-rc1-mm1. Recent changes
to -mm consist mainly of a large reduction in size as hundreds of patches
flow into the mainline.
The current stable 2.6 kernel is 2.6.26.2, released on August 6. It
contains a large set of fixes for a wide variety of problems. Previously,
2.6.26.1 (also a large set of
fixes) was released on August 1.
For 2.6.25 users: 2.6.25.14
(August 1) and 2.6.25.15
(August 6) continue the series of fixes for that kernel release.
Comments (none posted)
Kernel development news
Now, the thing you should take away from this is: kernel people
have cool toys, and CPU's that are faster than what you
have. Nyaah, nyaah.
--
Linus Torvalds
Part of the problem I suspect is that the AV folks have managed to get
CIO's believe that all computer systems need to have anti-virus
software, of the same design that is needed for DOS/Windows systems.
This state of delusion is so bad that apparently some AV engineers
aren't even willing to reason from first principles what is necessary
or not to maintain a secure system.
And arguably, if the goal is security theater, much like the security
lines in airports, perhaps it doesn't matter. If there are silly
CIO's that are willing to pay for such a thing, regardless of whether
or not it is actually *necessary* to maintain security, one school of
capitalism would say it doesn't matter if it actually provides any
functional value or not.
On the other hand, it seems pretty clear there are plenty of LKML
developers who aren't buying it. :-)
--
Ted Ts'o
Comments (3 posted)
The Ottawa Linux power management summit was held on July 22, 2008 -
immediately prior to this year's Linux Symposium. For those who could not
be there, Len Brown has posted a set of notes from the meeting. The
discussion covered a wide variety of topics, including the OMAP3 processor,
snapshot boot, USB power management, server power management, and more.
Full Story (comments: none)
One of the changes merged for 2.6.27 is a set of system call extensions
designed to get around some longstanding security issues with POSIX file
descriptors; LWN
covered these
extensions back in May. Now the author of that work (Ulrich Drepper)
has posted
a
description of these changes, why they are important, and how they will
be used in the C library. Worth a look, especially for developers
working on threaded applications.
Comments (39 posted)
By Jake Edge
August 6, 2008
Adding new functionality to the kernel while maintaining the interfaces for
user space is the standard kernel development practice. Sometimes, though,
that can tickle bugs in user-space programs in unpleasant ways. When that
happens, it is clearly a regression—something that worked before no
longer does—but is it a kernel regression? In the end, it doesn't
matter, it seems, because the kernel needs to change to keep the user-space
program working, even at the expense of "ugliness".
Clearly for
purely internal kernel functionality, there is no
mandate for compatibility across kernel versions. But, when the user-space
interface is involved, things get a bit trickier. A change that
alters the way a documented interface works is essentially never done;
user-space interfaces are maintained forever.
When new functionality properly uses a documented interface, but breaks a
user-space program, it gets
murkier.
That situation came up recently when Andrew Morton noticed that the linux-next tree broke the X
server on his laptop. The problem was quickly diagnosed as a problem in
the Synaptics touchpad driver for X. An array that was being passed to an
ioctl() was sized based on the number of bits, rather than bytes, it
should contain. Thus the maximum buffer length passed was off by a factor
of eight.
As a solution, Dmitry
Torokhov offered up a patch, not to kernel
code, but to the synaptics X driver. That didn't sit
particularly well, with Morton and others, eventually leading to a pronouncement from Linus Torvalds:
If somebody has the commit that broke user space, that commit will be
_reverted_ unless it's fixed. It's that simple. The rules are: we don't
knowingly break user space.
Torokhov clearly felt that it was the driver, not his changes, that were at
fault, which is entirely understandable because it's true. That doesn't
alter the fact that new kernels would break existing, working
configurations on laptops everywhere. The kernel change just fully used an
existing, documented interface as Torokhov explained:
It is not like we broke ABI here. The program (synaptics driver) had a
grave bug. Older kernels happened to paper over the bug because they
did not fill the whole buffer that was advertised as available. Now
that we have more data to report the bug bit us.
Declaring an array of 64 bytes, but telling the kernel it can store up to
511 bytes into it is obviously a bug.
But, as Morton points out:
It really really doesn't matter what the causes are or which piece of
code is at fault or anything else like that.
What _does_ matter is that people's stuff will break. Apparently lots
of people's. That's a problem. A _practical_ problem. Can we
pleeeeeeze be practical and find some way of preventing it?
Since the code was in linux-next, it was targeted at the 2.6.28 kernel.
In Torokhov's thinking, this would allow something approaching six months
for distributions to update the synaptics driver. But that is a fundamental
misunderstanding of how and when kernels are upgraded—it is not only
by way of distributions. Introducing a change like this would result in
many messages to linux-kernel from unhappy folks with broken X servers.
Kernel hackers purposely build and run kernels on a wide variety of
hardware and distributions. That includes older distributions that no
longer get updates so they would be stuck with the buggy driver, thus
non-working X server, essentially
forever. Obviously, they could rebuild the synaptics driver—kernel
hackers have been known to compile things other than kernels—but that
isn't the point.
There are major benefits to also having lots of regular users update their
kernels
frequently. Trying to ensure that there won't be any unnecessary barriers
to doing that can only help. Torvalds describes it this way:
And if we want to encourage people to upgrade their kernel very
aggressively (and we absolutely do!), then that means that we have to also
make sure it doesn't require them upgrading anything else.
Torvalds and Torokhov worked out a fix that preserved the old behavior for
a specific passed-in buffer length, while allowing the new events to be
delivered to any other users of the ioctl() that passed in the
proper length. Torvalds commented:
"Yeah, it's not pretty, but pragmatism before beauty."
It is, to some extent, a gray area. Regressions are bad for any number of
reasons, but maintaining hackarounds for buggy user-space programs has its own
set of problems. The hope is that eventually the need for the workaround
goes away so that it can be removed. It would seem difficult to determine
when the last user of the old synaptics driver finally upgrades, so this
code could be with us for a long time. Given the alternative, the
price seems worth it.
Though Torvalds was absolute in condemning any known regression,
even for programs that are clearly misusing an interface, there must be a
line somewhere. If some obscure program, with few users, gets broken by
the kernel doing something documented and reasonable, it is hard to imagine
that this kind of workaround will be required. This particular problem was
relatively easy to decide, the next might not be.
Comments (9 posted)
By Jonathan Corbet
August 4, 2008
Kernel developers will often use
printk() to output a message when
something goes wrong. Such messages tend to be helpful to kernel
developers; if nothing else, they can be used to find the place in the
source where the message is emitted, and that, in turn, is most useful for
somebody trying to figure out what the message is really saying. So, if
your kernel tells you, for example, "lguest is afraid of being a guest," a
quick dig through the source turns up a comment reading "Lguest can't run
under Xen, VMI or itself. It does Tricky Stuff." Problem solved - or, at
least, understood.
But, for the bulk of Linux users and administrators, the act of
printk() interpretation by recourse to the kernel source is,
itself, Tricky Stuff. If the kernel cannot tell them directly what the
problem is, they would much rather have a more straightforward means
of translating messages into some sort of useful English.
Or maybe not: for many Linux users, English may not be much more helpful
than straight kernel-speak. It would be really nice to translate those
messages into some sort of useful French, or Chinese, etc. What it comes
down to, in the end, is that printk() alone will never be able to
provide sufficient information to users in a way which can be understood
and used to solve problems.
Just over one year ago, LWN looked at some proposals for
adding structure to kernel messages. After that, the discussion went
quiet, to the point that it seemed like not much was happening in the
messaging area. But one should not forget that we are dealing with
companies like IBM which have been creating massive binders full of kernel
message documentation for several decades. They're not going to give up so
easily. So the posting (by Martin Schwidefsky) of a new
kernel messaging proposal is not an entirely surprising event.
In the latest scheme, each source file which generates structured messages
defines a macro KMSG_COMPONENT as a string naming the specific
subsection. This name will often match the name of the module which is
created from that code, but that is not necessarily the case. The name,
once chosen, is supposed to remain fixed forevermore; it becomes, in
essence, part of the user-space interface and should always match the
documentation.
Then, each message is assigned an integer identification number. The
combination of the component name and the message number should be unique
throughout the kernel; it is used by various tools to associate a more
detailed explanation of whatever the message is intended to communicate.
The message number is used with one of a number of new
printk()-like functions:
kmsg_alert(id, format, args...);
kmsg_err(id, format, args...);
kmsg_warn(id, format, args...);
kmsg_info(id, format, args...);
kmsg_notice(id, format, args...);
kmsg_dev_alert(id, dev, format, args...);
/* ... */
The "
_dev" versions take an additional
struct device
argument (like
dev_printk()) and encode the device name in the
resulting message. That message (for all variants) will include the
component name and the message number in any output. So, for example, the
S/390 "xpram" driver includes the following:
#define KMSG_COMPONENT "xpram"
/* ... */
if (devs <= 0 || devs > XPRAM_MAX_DEVS) {
kmsg_err(1, "%d is not a valid number of XPRAM devices\n", devs);
Should this particular error check trigger, the resulting message will look
like this:
xpram.1: 42 is not a valid number of XPRAM devices
Thus far, our user is probably not feeling much better informed than
before. But there is additional information which is made available
and associated with that message tag. In this particular case, it looks
like this:
/*?
* Tag: xpram.1
* Text: "%d is not a valid number of XPRAM devices"
* Severity: Error
* Parameter:
* @1: number of partitions
* Description:
* The number of XPRAM partitions specified for the 'devs' module parameter
* or with the 'xpram.parts' kernel parameter must be an integer in the
* range 1 to 32. The XPRAM device driver created a maximum of 32 partitions
* that are probably not configured as intended.
* User action:
* If the XPRAM device driver has been compiled as a separate module,
* unload the module and load it again with a correct value for the
* 'devs' module parameter. If the XPRAM device driver has been compiled
* into the kernel, correct the 'xpram.parts' parameter in the kernel
* parameter line and restart Linux.
*/
Here, we have a more verbose description of the message. Even more
helpfully (one hopes), there is a discussion of what can be done to make
this message go away. This information can be provided within the source
or in a separate documentation file; it can also, presumably, be nicely
formatted and distributed to paying customers as a binder for the system
administrator's bookshelf. It can be translated into other languages for
Linux users worldwide (and beyond: one could have a lot of fun with the
Klingon translation for this kind of material).
The patch includes a script (written in Perl with undocumented messages, of
course) which (when invoked with make D=1) will go through
the source and make sure that every kernel message has an associated
description block; it can also format the descriptions into man pages if
desired. There are checks for missing descriptions or overloaded message
ID numbers; the script does not, at the moment, check for a change in the
message text.
Martin's first posting made this work specific to the S/390 architecture;
following a suggestion from Andrew Morton,
he made it generic in later versions. The cost of this work is zero for
those who do not use it, so there is a reasonable chance that it will find
its way into the mainline eventually. Before the message catalog system can be truly
useful, though, developers will have to go through and document a
substantial portion of the messages created by the kernel - and keep that
documentation current as the kernel evolves.
Comments (12 posted)
By Jonathan Corbet
August 6, 2008
The TALPA malware scanning API was
covered here in December, 2007.
Several months later, TALPA is back - in the form of
a patch set posted by a Red Hat
employee. The resulting discussion has certainly not been what the
TALPA developers would have hoped for; it is, instead, a good example of
how a potentially useful idea can be set back by poor execution and
presentation to the kernel community.
The idea behind TALPA is simple: various companies in the virus-scanning
business would like a hook into the kernel which allows them to check for
malware and prevent its spread. So the patch adds a hook into the VFS code
which intercepts every file open operation. A series of filters can be
attached to this intercept, with the most important one being a mechanism
which makes the file being opened available to a user-space process as a
read-only file descriptor. That process can scan the file and tell the
kernel whether the open operation should be allowed to proceed or not. In
this way, the scanning process can prevent any sort of access to files
which are deemed to contain bits with evil intentions.
There are a few other details, of course. A caching mechanism prevents
rescanning of unchanged files, increasing performance considerably. There
is also a hook on close() calls which can trigger the rescanning
of a file. Processes can exempt themselves from scanning if it might get
in their way; scanning can also be turned off for specific files, such as
those used for
relational database storage. But the patch set is relatively small, as it
really does not have that much to do.
This capability could well prove to be useful. Even if one is not
concerned about malware infections on Linux systems, a lot of files
destined for more vulnerable platforms can pass through Linux servers.
There is also the potential for the detection of attempted exploits of the
Linux host. Normally, in the Linux world, the way we respond to knowledge
of a specific vulnerability is to patch the problem rather than scan for
exploits, but there may be systems which cannot be restarted on short
notice, and which could benefit from an updated scanning database while
running code with known vulnerabilities. Also, as Alan Cox pointed out, this feature could be
useful for entirely different objectives, such as efficient indexing of
files as they change.
What might be best of all, though, is that this hook could replace a number
of rather less pleasant things being done by anti-malware vendors now.
Some of these products use binary-only modules, plant hooks into the system
call table, and generally behave in unwelcome ways. Moving all of that to
a user-space process behind a well-defined API could be beneficial for
everybody involved.
The patches have gotten a generally hostile reception on the kernel mailing
lists, though. Some developers are
uninspired about the ultimate objective:
So you are going to try to force us to take something into the
Linux kernel due to the security inadequacies of a totally
different operating system? You might want to rethink that
argument.
That's an objection which can be worked around; the kernel developers do
not normally want to determine which applications will or will not be supported by
the system as a whole.
Another objection, though, might be harder: this hook is said not to be the
best solution to the problem. Instead of putting a hook deep within the
VFS layer, the anti-malware people could simply hook into the C library
(perhaps with LD_PRELOAD), put the malware scanning directly into
the processes (mail clients or web servers, say) which are passing files
through the system, or embed the scanning into a stackable filesystem
implemented with FUSE (or a similar mechanism). That has led to
counterarguments that scanning implemented in this manner could be evaded
by a hostile application - by performing system calls directly, for
example, instead of going through the C library. Certain kinds of attacks,
it is said, could get around a purely user-space solution.
That argument, however, highlights the real problem with this posting. The
patch includes a set of 13 "requirements," including intercepting file
opens, caching results, exempting processes, and so on. But none of these
requirements describe the problem which is really being solved. In
particular, as noted by Al Viro and others,
there is no description of the threat which this patch is intended to
mitigate:
Various people had been asking for _years_ to define what the hell
are you trying to prevent. Not only there'd been no coherent
answer (and no, this list of requirements is _not_ that - it's
"what kind of hooks do we want"), you guys seem to be unable to
decide whether you expect the malware in question to be passive or
to be actively evading detection with infected processes running on
the host that does scanning.
If the scanning host could be infected, then a scanning mechanism which
could be circumvented by a rogue program is indeed a problem. But that is
a very different threat than simply trying to prevent evil attachments from
creating mayhem on Windows boxes; it does not appear to be a threat which
these patches are trying to address.
The lack of a clearly described problem has caused the discussion of these
patches to go around in circles; it is not possible to evaluate
(1) whether the goals of these patches are worth supporting, or
(2) whether the patches can actually be successful in achieving those
goals. The code, in other words, cannot be reviewed. Until the TALPA
developers can clarify that situation, their work will look like an example
of "shoot first, then aim." That kind of code tends not to make it
into the mainline, even if it could be useful in the end.
Comments (26 posted)
Patches and updates
Kernel trees
Build system
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Architecture-specific
Security-related
Virtualization and containers
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>