Kernel development
Brief items
Kernel release status
The current development kernel is 2.5.52, which was released by Linus on December 15. It consists mostly of fixes and updates, of course, but there's also a bunch of changes from Andrew Morton's "-mm" tree (including the long-term fix for the ext3 data=journal corruption bug), XFS and JFS updates, more module fixes, and a kconfig update. See the long-format changelog for the gory details.The current stable kernel is 2.4.20; Marcelo released the second 2.4.21 prepatch on December 18. This large patch is mostly made up of ia-64 updates, but it also includes some NFS fixes, a couple of ext3 fixes, a bunch of stuff from the "-ac" tree, a new megaraid driver, and various other fixes and updates.
For those using very stable kernels: Alan Cox has announced the first 2.2.24 release candidate. It contains a handful of bug fixes, including one for a new denial of service vulnerability caused when somebody runs mmap() on a /proc/pid/mem file.
Kernel development news
How to speed up system calls
It all started with an observation that system calls on a modern Pentium 4 processor are far slower than on older CPUs. It seems that, for whatever reason, software interrupts generated with the int instruction are very slow with the P4 processor. Since x86 Linux invokes system calls with "int $0x80", that slowness makes itself felt - especially with system calls (like getpid()) that would, otherwise, be very fast.There is an obvious solution to this problem: use the sysenter instruction instead. sysenter is quite a bit faster on modern Pentium processors. There are just a couple of problems: not all x86 processors support sysenter, and sysenter steps on registers in ways that can be hard to work around.
The lack of across-the-board support for sysenter is a problem. The kernel maintains a set of flags telling it what capabilities a given processor has; other processor-specific options are set at configuration time. System calls, however, are not invoked from the kernel - that is the C library's job. The last thing glibc needs is to be trying to figure out, at run time, the right way to invoke system calls.
Linus's solution to this problem is a patch which brings back a variant of an old idea. As of 2.5.53, the kernel will map a global, read-only page at the top of every process's address space. That page contains the optimal code for executing a system call on the current processor. Whenever glibc needs to call into the system, it simply sets up the registers and, rather than doing the old int $0x80, it jumps into the new page. The C library still needs to do a runtime test (since older kernels will lack this "vsyscall" page), but it need not concern itself with the detailed capabilities of different processors.
Keeping the registers straight turned out to be a trickier problem. The
way sysenter steps on registers makes it hard to invoke system
calls with more than five parameters. Various schemes were looked at,
including creating a new "extra argument block" or simply requiring that
six-argument system calls be invoked the old way. Linus finally came up
with a tricky solution that makes it all work, however; those of you who
like digging through x86 assembly may want to peek at his "absolutely wonderfully disgusting solution" to
the problem. "I'm a disgusting pig, and proud of it to boot.
"
The result of all this: the gettimeofday() system call runs in just over half the time on a P4 processor. The speedup on Pentium 3's is less - a factor of 1.2 - but is still worthwhile.
Now that the vsyscall page is in place, will it be used for other things, such as implementing gettimeofday() entirely in user space? The answer, for now, appears to be "no". Getting a user-space gettimeofday() right is, seemingly, harder than it looks; there are synchronization issues, especially on some SMP systems where the clocks may not be synchronized by the hardware. So a user-space gettimeofday() appears to not be in the works, for now at least.
Whatever happened to the feature freeze?
While most people seem to think that the new system call mechanism makes sense, the question has come up: what kind of feature freeze are we in if we're adding things like a whole new way of doing system calls? Alan Cox, perhaps, had the most direct comment:
Given the high hopes that have been placed on this feature freeze actually working, this sort of remark is something to be concerned about.
Linus has acknowledged the concern, and started a discussion on how patches should be reviewed. Looking ahead:
There seems to be fairly widespread agreement, however, that this approach could be overly bureaucratic for now. Each development kernel release still contains hundreds of patches (636 for 2.5.51; in 2.5.52 there were "only" 153); people are understandably nervous about having that many patches go through a committee. Or even worse, being on the committee. Of course, Larry McVoy has an elaborate approach involving BitKeeper all planned out, but, given that a couple of people on the short list don't use BitKeeper, things will probably not go that way.
Andrew Morton has suggested simply adopting a set of guidelines for what can be accepted. The suggested list:
- Bug fixes
- Speedups
- In-progress features (or those Linus had already said would be merged)
- New drivers or filesystems
Anything outside of that list would not be included at this point. As the freeze gets harder, items are dropped off the list, until only bug fixes are left.
Given everybody's time constraints, the relatively informal approach is the most likely one to be adopted at this point. The important thing, in the end, is that everybody agrees that the feature freeze is important and is keeping an eye out for violations. As long as that continues, things will hopefully not get too far out of control.
Supporting hardware crypto in the kernel
Now that the kernel has its own cryptographic API, James Morris is thinking about how to support cryptographic hardware. A number of cards which perform cryptographic functions exist, and it would be nice to be able to make full use of these cards with a Linux system. Quite a few issues need to be considered on the way there, however, including:
- How should multiple cards be supported? This gets tricky, especially
for session-oriented crypto operations.
- How should card failures (and resource exhaustion) be handled? The
current crypto API isn't designed around this sort of failure.
- Some network cards can do their own IPsec processing; taking advantage
of that capability may require a higher-level interface.
- User space may want to be able to use cryptographic devices as well,
meaning that some sort of interface needs to be designed.
- Many devices lack useful programming documentation, which will make creating a Linux driver harder (or impossible).
And so on. Now is the time to get these decisions right; anybody who is interested in the interface to cryptographic hardware should probably have a look at James's posting and join the discussion.
Elks Distribution Edition 0.0.5 released
Don't throw away that old 80286 system yet - with the just-announced release of EDE (Elks Distribution Edition) 0.0.5, that system, too, can run Linux. EDE comes with a bleeding-edge 0.1.1 kernel and a new elkscmd package; click below for the details. (Thanks to Alan Cox).
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Janitorial
Memory management
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>
