Brief items
The current 2.6 kernel is 2.6.2. The most recent 2.6.3 prepatch is
2.6.3-rc2, which was released on
February 9. This prepatch is large, with many changes merged; the big
ones include more network driver cleanup work, a USB update (including the
removal of the USB scanner code), the new DMA pool abstraction (covered in
last week's LWN Kernel Page), an ACPI
update, an NFS update, and more. See
the
long-format changelog for the details.
The removal of the USB scanner code has concerned some readers. It was
removed because it is broken and unmaintained, and because the accepted way
of driving USB scanners in 2.6 is via the user-space libusb library.
2.6.3-rc1 was released on February 6.
This one contained a lot of network driver cleanups, a number of
gcc-3.5 fixes, various architecture updates, a big ALSA update, and more;
once again, the long-format changelog has the
details.
Linus's BitKeeper tree contains some architecture updates, a filesystem
scalability improvement, some CPU frequency control updates, and a few
other fixes.
The current tree from Andrew Morton, as of this writing, is 2.6.3-rc1-mm1. Recent additions include a lot
of fixes, some performance improvements, but little in the way of new
features.
The current 2.4 kernel is 2.4.24; the first 2.4.25 release candidate
was announced on February 5.
The current stone-age kernel is 2.0.40, which was released by David Weinehall on
February 8. It contains some security fixes, so if you have any
systems still running 2.0 you may want to consider upgrading.
Comments (1 posted)
Kernel development news
The kernel development community has long been divided over the topic of
interactive debuggers. Many hackers find debuggers to be an indispensable
part of their development toolkits. Others claim that debuggers lead
people to fix symptoms rather than problems; rather than use such a crutch,
these people say, it is better to truly understand the code. Once you have
"become one" with the code, finding bugs is not that hard.
The latter view is held by Linus Torvalds, who explained his
approach in very clear terms back in 2000:
You can use a kernel debugger if you want to, and I won't give you
the cold shoulder because you have "sullied" yourself. But I'm not
going to help you use one, and I would frankly prefer people not to
use kernel debuggers that much. So I don't make it part of the
standard distribution, and if the existing debuggers aren't very
well known I won't shed a tear over it.
The end result is that there has never been support for interactive debuggers
in the mainline kernel - at least, for the more popular architectures.
The 2.6 kernel is now Andrew Morton's turf, however, and Andrew is more
open to the value of debugging tools. In fact, he has carried a version of
the kgdb patch in his -mm tree for a long time. Might Andrew merge kgdb
into the 2.6 kernel at some point?
The answer from Andrew seems to be "maybe":
I wouldn't support inclusion of i386 kgdb until it has had a lot of
cleanup, possible de-featuritisification and some thought has been
applied to splitting it into arch and generic bits. It's quite a
lot of work.
In other words, there is no disagreement with the idea of merging kgdb, but
the code needs some work first. Problems include a large number of
#ifdefs, and the fact that the patch is relatively intrusive,
touching many files. There are also objections to how the debugger works
with the virtual memory subsystem, especially for the i386 architecture.
All of these problems are probably solvable, given enough development
time. The interest in a mainline kgdb is probably high enough that the
cleanup work will happen, and kgdb may well be merged; a kgdb CVS
repository has been established for those
interested in this effort. An eventual merge into 2.6 seems unlikely to
carry forward into 2.7, however.
Comments (none posted)
Newcomers to the kernel code base are often surprised by the appearance of
(what seems to be) a bunch of calls to functions called
likely()
and
unlikely(). These calls always appear in conditional tests,
along these lines:
if (likely(some_condition)) {
/* Do something */
}
In fact, likely() and unlikely() are not function calls
at all; instead, they are hints to the compiler. If the compiler knows
that one outcome is far more likely than the other, it can optimize the
code it generates accordingly. On some architectures, this information can
also be encoded into the object code, where it will override the branch
prediction normally done by the processor.
David Woodhouse noted that the differing
interpretation of these directives by different architectures makes it hard
to know when likely() and unlikely() should be used. If
the result of one of those directives is just a bit of code optimization,
they should be used liberally whenever the programmer knows that one
outcome will happen more often than the other. On some architectures,
however, the cost of guessing wrong is fairly high, and these directives
should only be used where the odds are overwhelmingly in favor of one
outcome.
David's proposal is to replace likely() and unlikely()
with a new probable() macro:
probable(condition, percent)
Where "percent" is the programmer's estimation of how often the
condition will evaluate true. Each architecture could then decide what to
tell the compiler based on the given percentage.
Rusty Russell has a more straightforward
answer, saying that these directives should be rarely used.
Sometimes, unlikely()/likely() help code readability. But
generally it should be considered the register keyword of the
2000's: if the case isn't ABSOLUTELY CRYSTAL CLEAR, or doesn't show
up on benchmarks, disdain is appropriate.
The "disdain" approach seems more likely to be adopted than a new macro.
There will be very few code paths where these directives will make a
measurable difference. And the fact is that programmers often guess wrong
about which code paths will be taken how often.
David would also like to add a probability to the get_unaligned()
macro, which is used to access data which might not have the alignment
required by the processor. Some architectures can handle any alignment; on
those, get_unaligned() expands to a direct pointer dereference.
Others require that unaligned access be done via multiple, smaller fetches
or stores. Of those, some architectures can fix up an unaligned access
attempt in an exception handler, and others cannot. For architectures
which can fix unaligned accesses, it might be faster to take an occasional
exception if the probability of an unaligned access is small. Adding a
probability to the get_unaligned() macro (and
put_unaligned() as well) would allow each architecture to optimize
those accesses. Whether the resulting performance improvement would
justify the effort remains to be seen.
Comments (2 posted)
H. Peter Anvin
wants to know if anybody is
still using the old BSD pseudo terminal ("pty") interface. These devices
show up on most systems as
/dev/ptyXX; they were once used for
applications like network logins. Most applications on most Linux systems
have not used BSD ptys for some years now; instead, the newer
/dev/pts devices are used.
Peter is asking because he has plans for the pseudo terminal subsystem;
he'd like to clean it up, make it more dynamic, and make use of the larger
device numbers available in 2.6. The need to maintain compatibility with
the BSD interface is, it seems, interfering with that work. So Peter would
like to remove the BSD pty interface if possible.
There have been a few complaints. The bootlogd utility used by
some distributions apparently uses BSD terminals in some cases. Truly old
systems may still use the old interface for network logins or terminal
emulator windows; this is not functionality that one breaks lightly. Peter
may yet find a way to maintain BSD pty support while making his other
changes. Even
so, the BSD pty interface may be headed toward the end of its life sometime
in the 2.7 development series.
Comments (5 posted)
It has long been intended that the sysfs virtual filesystem would contain
information about all of the hardware (and more) installed on a given
system. Implementation of this intention has lagged in places, however,
and there are still parts of the system which lack sysfs support. One of
those areas is the frame buffer device code. In an attempt to fill in that
gap, James Simmons recently posted
a patch
adding sysfs support for frame buffer devices; this patch was merged into
2.6.3-rc1.
There is only one problem with this patch: it can oops the kernel when
frame buffer driver modules are unloaded. The problem is the same one
which has afflicted other subsystem sysfs implementations: lifecycle
rules. Once a data structure has been exposed via sysfs, user space can
hold references to that structure indefinitely. Open sysfs files can
persist long after the underlying device has been removed from the system,
and long after the relevant module has been unloaded. If the behavior of
sysfs-exposed data structures has not been carefully laid out, the kernel
can be left holding references to structures or code which no longer
exist.
This sort of problem hit the networking subsystem hard. Once
net_device structures were exposed via sysfs, it was no longer
possible to allow individual network drivers to control what the lifecycle
of those structures is. As a result, it is now necessary to allocate all
net_device structures dynamically, and to let the networking
subsystem decide when and how to free those structures. The networking
code is also very careful not to access any module code after a
net_device has been shut down. The end result is that
net_device structures can persist in the system long after the
module which created them has been removed. It all works, but the cost was
a lengthy cleanup operation which has only now reached something close to
completion.
The frame buffer patches attempted to do things right from the beginning by
making the fb_info structure into a dynamic object. A support
function exists to allocate the structure, and it is automatically freed
when the last reference is removed. The only problem is that the frame
buffer drivers do not use this interface; they allocate and destroy
fb_info structures on their own. As a result, in the 2.6.3-rc1
(and -rc2) kernel, fb_info structures can be freed twice (or
staticly-allocated structures can be freed once). That sort of error tends
to create displays on the frame buffer that the user does not want to see.
Fixing this problem requires updating every frame buffer driver to use
dynamically-allocated fb_info structures. James has stated his
intent to make this change. In the mean time, the "stable" kernel release
candidate has a known problem which will require a wide-ranging set of
changes to fix.
Al Viro, a master of this sort of transition, has grumbled that these changes should have been
done in the opposite order, so as to avoid breaking things. Others have
complained that this sort of change is too big for a stable kernel series
and should have waited for 2.7.
Yet another approach, however, would be to
use the "class_simple" interface, which was merged in 2.6.2-rc1. This
interface makes it easy to retrofit a /sys/class interface into
existing drivers without having to deal with some of the more complex
lifecycle issues. The interface is straightforward; one starts by creating
a class:
struct class_simple *class_simple_create(struct module *owner,
char *name);
The owner argument should almost always be passed as
THIS_MODULE; the name will show up under
/sys/class. The resulting class can be removed at some later time
with:
void class_simple_destroy(struct class_simple *class);
Entries for individual devices can be added with:
struct class_device *class_simple_device_add(struct class_simple *class,
dev_t dev,
struct device *device,
const char *fmt, ...);
Here, class is the class which was created above,
dev is the device number for the device,
device is a struct device structure for this device (it
can be NULL),
and the rest is a printk()-style format string to create the name
for the entry. The result (on success) is a sysfs directory with exactly
one attribute: a file called dev which contains the device
number. That is adequate for a tool like udev to create
corresponding device nodes.
The entry can be removed, of course:
void class_simple_remove(dev_t dev);
The whole thing works without maintaining references into the calling
driver, so most of the lifetime rule issues are avoided. More recent
changes to the class_simple interface include (in 2.6.3-rc)
hotplug support.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Janitorial
Architecture-specific
Security-related
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>