Linux in the news
All in one big page
See also: last week's Kernel page.
The current development kernel release is (still) 2.4.0-test9. On the prepatch side, 2.4.0-test10-pre5 came out on October 23. The bug fixing continues...
The current stable kernel release is still 2.2.17. The 2.2.18 prepatch is up to 2.2.18pre17. More fixes have gone in, but there's still a list of things that need to be dealt with before the real 2.2.18 release can happen.
Should applications be allowed to bind to any IP address? "Binding," of course, is how a server gets set up to accept connections. The current 2.4.0-test implemention allows binding to an arbitrary address - even if the system has no interfaces at that address. Such an action would seem to make no sense; if there are no interfaces which can receive packets to an address, a server that has bound to that address will have very little to do.
The reasoning behind allowing that sort of binding is that an interface could conceivably come up in the future which does correspond to the given address. Not all interfaces are up all the time, and life may be simpler for servers if they do not have to be continually checking to see if the network is there yet.
There are a couple of problems with that behavior, though. It turns out that the POSIX standard requires that a bind to a nonexistent address fail. And it turns out that some applications try to bind to an address as a way of determining whether the address is local or not. The Java virtual machine, in particular, does this; the 2.4.0 semantics confuse it and causes the compatibility test to fail.
As a result, the ability to bind to nonexistent addresses will be going away. There will, however, be a sysctl option added that will allow the system administrator to restore that behavior if need be.
A new Linux event handling interface? Readers of linux-kernel this week were treated to a lengthy discussion of how Linux makes event information available to applications, and the beginnings of a new interface that may improve on things in the future.
The mechanism used by most applications for tracking events is the poll() system call. poll() essentially takes a list of open files (and devices and network sockets...) and blocks until one or more of them is ready to perform I/O. The classic example of a user of poll() is the X window system server, which has a long list of client connections and must be able to respond to input events on any of them.
Dan Kegel started things off by posting the results of some benchmarks he did with poll(). To stress things a bit, he tried an application watching 100, then 10,000 file descriptors on both Linux and Solaris. Solaris did rather better than Linux did; in particular, it showed only a factor of 6.5 time difference between 100 and 10,000 sockets.
Some people were quick to downplay the results, pointing out that they almost have to indicate a large setup time on the Solaris side that will penalize programs polling a small number of sockets (which is most of them). Linus was in this camp:
Basically, for poll(), perfect scalability is that poll() scales by a factor of 100 when you go from 100 to 10000 entries. Anybody who does NOT scale by a factor of 100 is not scaling right - and claiming that 6.5 is a "good" scale factor only shows that you've bought into marketing hype.
Others pointed out that the Linux implementation of poll() is not ideal, since it requires four passes over the list of file descriptors: (1) reading them into kernel space, (2) querying drivers and setting up wait queues, (3) querying again after an event happens, and (4) copying results back to user space. Every pass over a large array hurts.
The Linux poll() implementation could probably be improved to perform fewer passes over the list. The real problem, though, is that poll() requires the system to pass over such a large array in the first place. To make things worse, the array is entirely under the application's control, so every call to poll() is like the first one. Clearly there is some room for improvement here, and this conversation got people thinking about a better way of doing things.
So Linus posted a new interface design reflecting one of those better ways. Read the posting for the details; in very simple terms, the proposed interface allows the application to tell the kernel about events of interest. The kernel maintains the list, and thus knows when the list changes. Each process has a queue of events waiting to be processed, which it may look at with a system call. Whenever an event actually happens (a network connection arrives, for example) the kernel adds it to the list of every process that is interested - but only if an event of that type is not already on the queue.
The business about putting only one event of a given type on the queue is important. An event notification from the kernel means that one or more events are pending, and the application must be sure to deal with them all. This requirement makes life a little bit harder for applications, but much easier for the kernel. Among other things, the kernel need not worry about running out of memory should a large blast of network packets show up.
Of course, nothing much is new under the sun...Dan Kegel pointed out that Linus's scheme bears a strong resemblance to the FreeBSD kqueue mechanism. It has evolved somewhat under discussion as well. Nobody, yet, has rushed out to implement this approach - it would be a 2.5 item in any case. But something along these lines will likely happen before too long. The fun of free software is that you can see it take form in the early stages.
Access Control Lists and extended attributes. Andreas Gruenbacher released version 0.7.0 of the Access Control List (ACL) patch. This release was the first stable release in some time... except that it was closely followed by 0.7.1 to fix up a few details..
On a more general level, Andreas also posted a proposal for the implementation of "extended attributes" (such as access control lists) in the Linux virtual filesystem. The ACL project has had an extended attribute patch for a while; they would now like to begin the process of getting it into the kernel.
Something will almost certainly go in at some point, but the extended attribute interface may well see some changes first. Stephen Tweedie posted a separate extended attribute specification which was evidently hammered out at the recent storage workshop in Miami. This version takes a wider view of things; it tries to handle things like the ACL's found on the NT filesystem and NTish identifiers that can be used by Samba. It's a complicated problem, and the kernel developers would like to solve it properly.
Once again, of course, this is 2.5 material, so there is some time to work out the details. The 2.6 kernel will likely have a much more extensive security scheme as a result.
KernelTrap.com hits the web. A new site called KernelTrap has turned up on the net. It is dedicated to kernel hacking in general, but its content is very much Linux-oriented.
Other patches and updates released this week include:
Section Editor: Jonathan Corbet
October 26, 2000