Brief items
The current development kernel is 2.6.0-test5, which was
released by Linus on September 8. Changes
this time include new, type-safe
ioctl() command code checker (see
below), a USB "gadget" framework which enables the creation of user-space
drivers, a new
CONFIG_64BIT configuration option, a number of
futex improvements, a reworked de4x5 driver, "very basic" VIA 8237 serial
ATA controller support, support for a software-implemented hard disk
activity LED, Intel
High Precision
Event Timers support, Al Viro's first set of large
dev_t
support patches (covered here
two weeks
ago), and his second set (which fixes up filesystems and removes the
kdev_t type) as well, some IDE work, a large USB update, lots of
network driver fixes, a new set of iptables modules, and many other fixes.
The
long-format changelog has all the
details.
Linus's BitKeeper tree contains a number of patches including some
initramfs tweaks, improvements in random driver locking (which was
"consuming 60% of CPU resources in Anton's monster power5 boxes"), the
removal of some ext3 debugging hooks, direct I/O support for reiserfs, some
CPU frequency work, an Intel SpeedStep-SMI driver, and various fixes.
The current stable kernel is 2.4.22; Marcelo has not released any
2.4.23 prepatches since 2.4.23-pre3 on
September 3.
Comments (none posted)
Kernel development news
Al Viro's second set of patches aimed at enabling the support of a larger
dev_t type has been merged into the 2.6.0-test5 kernel. The bulk
of the work is fixing up code in filesystems which made assumptions about
the size of
dev_t. As part of this whole process, however, Al has
been converting kernel code from the
kdev_t type over to using
dev_t directly.
kdev_t, of course, was introduced several major releases ago as a
way of hiding the actual structure of device numbers. The comments in
<linux/kdev_t.h> read:
As a preparation for the introduction of larger device numbers, we
introduce a type kdev_t to hold them. No information about this
type is known outside of this include file.
In practice it didn't work quite that way. When Linus changed the format of
kdev_t early in the 2.5 development series, everything broke.
And when the time came to really change the size of dev_t, it
turned out to be easier and more clear to simply use dev_t
directly. Kernel hackers tend to be skeptical of abstraction interfaces
which are created without being immediately useful; kdev_t is an
example of why that is so.
The seventh patch (of 15) in Al Viro's second
dev_t series changes the type of the much-used i_rdev
inode structure field; it is, of course, a dev_t now. Since Al
had already converted users of that field over to the new iminor()
and imajor() macros, the effect of this change was small. But, as
it turns out, i_rdev was the last kdev_t object in the
kernel. So patch eight removed the type
altogether.
Out-of-tree drivers will, of course, be broken as a result of this change,
but the fixes should not be that difficult. At this point, the bulk of the
large dev_t preparation work should be done. About all that's
left is to decide what the format of the new dev_t will really be
and make the change. Once the dust settles, another one of the 2.6.0 "must
fix" items will have been taken care of.
Comments (1 posted)
The
ioctl() system call includes a general "command" argument
which specifies which operation the calling program wishes to perform. The
Linux kernel has long had a mechanism for defining these command arguments,
with the goal of keeping them all unique. If no two drivers implement the
same command codes, there is no danger of strange things happen if the
wrong code is passed to the wrong driver. A world where "rewind the tape"
for one driver never translates to "initiate self destruct" for another is a
safer place to be for all of us.
The Linux kernel takes things a little further by encoding some useful
information in the command codes. Along with driver-specific "magic" and
command numbers, the ioctl() command code includes the direction
of data movement (if any) between kernel and user space and the size of the
data to be moved. The kernel itself does not do anything with those
values, but their presence does enable a driver to perform some checks.
If, for example, the size of a structure used as an ioctl()
argument changes, the driver can use the size field in the command code to
determine whether the application is using the older version or not. Some
kernel code actually does check the sizes to be sure that things match up.
The command codes are created using some macros in
<asm/ioctl.h>. A driver defining codes would use one
of these macros:
_IOR(type, number, size)
_IOW(type, number, size)
_IORW(type, number, size)
The macro used specifies whether the ioctl() operation reads or
writes kernel-space data (or both); type is the driver's "magic"
code, and number is the command-specific code. The confusion
comes in with the argument called size; it is supposed to be the
type of the data to be passed between kernel and user space. So, for
example, the "get tape position" code is defined as:
#define MTIOCPOS _IOR('m', 3, struct mtpos)
The problem is that a number of hackers saw the size argument and
assumed that they were expected to pass the size of the expected data
transfer. The result was a number of definitions like:
#define CIOC_KERNEL_VERSION _IOWR('c', 10, sizeof (int))
As a result, the actual size value, as encoded within the command, was the
size of the size value, or, on most architectures, four bytes. Since most
code never looks at that size value, things worked, but the values defined
were not as intended. Another problem that occasionally came up was that
some code used very large size values, overflowing the space allotted in
the command word, thus corrupting the rest of the command code. Once
again, things worked, but not quite in the way people expected.
One of the themes of 2.6 development has been the addition of type checking
anywhere that the compiler can be coerced into doing it. So the obvious
thing to do was to add checking to the generation of command codes; Arnd Bergmann
submitted a patch which does exactly that.
It adds a bit of preprocessor magic in the form of this macro:
#define _IOC_TYPECHECK(t) \
((sizeof(t) == sizeof(t[1]) && \
sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
sizeof(t) : __invalid_size_argument_for_IOC)
The first test ensures that an actual type (as opposed to a simple size)
has been passed in; the second makes sure it is not too large.
All that remains is the inconvenient fact that the old, erroneous codes
have found their way into a number of application programs. Changing those
codes would break those applications, and that's something the kernel hackers
try never to do. So, for these cases, a new set of macros (with names like
_IOW_BAD() has been introduced, and the erroneous uses have been
moved over to the new macros. The command codes remain unchanged, but the
mistake is noted so that it is not replicated when somebody copies the code
in question.
Comments (5 posted)
Patrick Mochel has posted
a new set of power
management patches. Power management is, of course, one of the last
unfinished projects in the 2.6.0-test kernel. So developments in that area
are of interest.
Much energy has gone into the suspend-to-disk implementation. Patrick has
been unable to come to an understanding with (2.6) swsusp maintainer Pavel
Machek; rather than keep trying, he has chosen to create his own
implementation (starting with swsusp) called "pmdisk." Should Linus accept
the patches, the 2.6.0-test kernel will have two separate, competing
implementations of the suspend-to-disk functionality. The swsusp version
has been reverted to its previous state; the patch includes the comment
"Note that I would never publically admit to putting such code into
the kernel."
The new pmdisk implementation has since seen some fixes, though it still
does not work on SMP systems, and apparently will not for some time. There
is a /sys/power/state file used to control pmdisk; writing
"disk" to that file will cause the system to suspend itself to
disk. Beyond that, pmdisk is still mostly the swsusp implementation with
a lot of cleanup work and the names of the functions and variables changed.
One remaining question with the suspend-to-disk functionality is what will
happen to all of Nigel Cunningham's work. Nigel has put a great deal of
effort into the 2.4 swsusp implementation, with the result that it has
become a reliable option for many users; see our
review of that work from August. Nigel would like to port his work
forward to 2.6, but is uncertain about what to port to.
This whole situation could be resolved by Linus, who has not yet accepted
the "fork swsusp" patch. Releasing a 2.6.0 kernel with two different
suspend implementations seems like a suboptimal course which could reflect
poorly on the Linux development process. Linus has made no public noises
to this effect, but it would not be surprising if he imposed some sort of
solution that led to a single suspend subsystem in 2.6.0.
Comments (3 posted)
Greg Kroah-Hartman has posted
a patch with
the rather uninspiring title of "add kobject to struct module." What the
patch really does, however, is enable the creation of a
/sys/module directory which will contain information about the
modules currently loaded into the kernel. With this patch, the only
available information (beyond the name of the module) is the reference
count, but that will be expanded in the future. Eventually all of the
information found in
/proc/modules will also appear in the
/sys/module tree, though in the standard sysfs "one value per
file" format. The values of parameters passed to the module will also be
made available for inspection and (permissions willing) change.
This patch continues the process of moving system information from
/proc to /sys. It may take a couple more development
series worth of work, but /proc might just end up being pared down
to the process information it was originally created to hold.
Comments (none posted)
One nice feature that was quietly slipped into the
2.6.0-test4-mm6 release is the
kgdb-over-ethernet patch, by Robert Walsh and San Mehat. As described in
the included documentation, kgdbeth makes it
frighteningly easy to hook into a running Linux kernel over the network and
prowl around in it. It's really just a matter of setting four boot parameters:
- gdbeth=number the device number of the ethernet interface to
use for debugging. Usually zero for eth0.
- gdbeth_remoteip to set the IP address of the machine which is
able to hook in with gdb.
- gdbeth_remotemac to set the remote system's MAC address.
- gdbeth_localmac to tell the kgdb stub what the local system's
MAC address is.
As one would expect, the target system will only respond to debugger
traffic coming from the system designated by the boot-time arguments. Once
you've booted a kernel with the kgdbeth patch and the proper parameters,
hooking in with gdb is simple. Here's a (slightly cleaned up) log from a
quick session done here at LWN Labs:
gdb ./vmlinux
(gdb startup stuff...)
(gdb) target remote udp:victim:6443
warning: The remote protocol may be unreliable over UDP.
warning: Some events may be lost, rendering further debugging impossible.
Remote debugging using udp:victim:6443
do_IRQ (regs=
{ebx = -1069465600, ecx = -1054087008, edx = -216755, esi = 624384,
edi = -1072664576, ebp = 581632, eax = 0, xds = 123, xes = 123,
orig_eax = -251, eip = -1072652202, xcs = 96, eflags = 582,
esp = -1072652057, xss = 0}) at arch/i386/kernel/irq.c:514
warning: shared library handler failed to enable breakpoint
(gdb) print ioport_resource
$2 = {name = 0xc0362e75 "PCI IO", start = 0, end = 65535, flags = 256,
parent = 0x0, sibling = 0x0, child = 0xc03a2a80}
(gdb) print *ioport_resource->child
$3 = {name = 0xc035d94f "dma1", start = 0, end = 31, flags = 2147483648,
parent = 0xc03a40e0, sibling = 0xc03a2a9c, child = 0x0}
(gdb) c
Continuing.
For anybody who has wanted to be able to use gdb on a running kernel, but
who has never gotten around to setting up the requisite serial lines and
such, kgdbeth promises to make things easier than ever.
Matt Mackall has noticed that a number of patches - including Ingo Molnar's
network console code and kgdbeth - each provide their own low-level
ethernet functions. Code which hooks into the kernel at such a fundamental
level needs to be able to send and receive packets without involving the
entire networking subsystem. As a way of addressing this duplication of
code and effort, Matt put together and posted a netpoll API. The patch came accompanied by new
versions of netconsole and kgdbeth, both of which are somewhat cleaned up
and significantly reduced in size. An added bonus is that netpoll supports
almost all interfaces out there without the need for any driver changes.
As of this writing, netpoll has not
found its way into an -mm release, but that could change.
Of course, Linus's feelings on kernel debuggers are well known, so kgdbeth,
while potentially useful for developers, is unlikely to find its way into
the 2.6 mainline. So Andrew Morton will have to keep this one in -mm. At
least, until Linus hands off the 2.6 kernel - to Andrew.
Comments (none posted)
Patches and updates
Kernel trees
- Andrew Morton: 2.6.0-test4-mm6. "<span>Dropped out Nick's CPU scheduler changes, brought back Con's interactivity
work.</span>"
(September 5, 2003)
Core kernel code
- Con Kolivas: O20.1int.
(September 10, 2003)
Development tools
Device drivers
Filesystems and block I/O
- Dave Kleikamp: JFS 1.1.2.
(September 7, 2003)
Networking
Architecture-specific
Security-related
Benchmarks and bugs
- Paul Larson: LTP nightly regression results for
2.6.0-test4,bk1,bk2,bk3,bk5,bk6,mm1,mm2,mm3-1,mm4,mm5,mm6.
(September 7, 2003)
Page editor: Jonathan Corbet
Next page: Distributions>>