LWN.net Logo

Looking forward to 2.6.18

Your editor, having returned from an all-too-short vacation, was faced with the prospect of looking over the 4500 (and counting) patches merged for the 2.6.18-rc1 release. Much of what has been merged is the usual set of fixes and updates, but some more user and developer-visible patches have gone in as well. The user-visible patches include:

  • The new core time system has finally found its way into the mainline; it was covered here in January, 2005, but has evolved considerably since then.

  • New device drivers for SMSC LAN911x Ethernet chipsets, ZyDAS ZD1211-based wireless LAN adapters, Myricom Myri-10G interfaces, CS553x NAND flash controllers, Amstrad E3 Delta flash controllers, Abit uGuru hardware monitoring chips, NS LM70 temperature sensors, a number of Echoaudio sound cards, and more.

  • Generic support for hardware random number generators has been added, along with drivers for a long list of generators.

  • The Philips Webcam driver has seen a massive update which adds image decompression support (without legal issues this time), support for a number of new devices, and many improvements.

  • A large set of NFS patches has been merged, adding, among other things, direct I/O support.

  • A netlink interface for networking bridging management.

  • A netfilter connection tracking helper for the SIP protocol.

  • The TCP Low Priority, TCP Compound, and TCP Veno congestion control algorithms.

  • A new mechanism for attaching SELinux labels to network packets. There is also a new set of hooks allowing SELinux to regulate the kernel key management subsystem.

  • Extended attribute support in the JFFS2 filesystem.

  • A number of kernel include files have been cleaned up to make it easier to include them into user-space applications.

  • PCI devices now export an "enable" attribute via sysfs. The main purpose for the new attribute is to allow the X server to enable and disable devices without doing direct I/O memory access.

  • The swapless page migration patches have been merged, easing the movement of pages between NUMA nodes. There is also a new move_pages() system call which can be used to determine where pages reside and possibly move them to a new node.

  • The TCP segmentation offload code has been updated and improved. There is a new "generic segmentation offload" layer which can emulate TSO in software; evidently this approach yields some of the performance benefits of TSO on hardware which does not support segmentation offloading.

  • The default disk I/O scheduler is now the "completely fair queueing" (CFQ) scheduler.

  • A massive set of serial ATA changes has been merged, including a new error handler, rewritten programmed I/O support, native command queueing (NCQ) support (which should improve performance considerably), and hotplug support.

  • Priority-inheriting futexes have been merged into the mainline.

  • SMPnice, a set of scheduler heuristic changes meant to improve handling of low-priority processes on SMP systems, has been merged.

Internal API changes visible to kernel developers include:

  • The generic IRQ layer has been merged. The SA_* flags to request_irq() have been renamed; the new prefix is IRQF_. A long series of patches has converted in-tree drivers over to the new names; The old names are scheduled for removal in January, 2007.

  • 64-bit resources are now supported. This change affects a number of users of the resource management API.

  • The kernel lock validator has gone in, along with a number of fixes for potential deadlocks found by the validator.

  • At long last, the devfs subsystem has been removed.

  • An API and support for the Intel I/OAT DMA engine.

  • The skb_linearize() function has been reworked, and no longer has a GFP flags argument. There is also a new skb_linearize_cow() function which ensures that the resulting SKB is writable.

  • Network drivers should no longer manipulate the xmit_lock spinlock in the net_device structure; instead, the following new functions should be used:

         int netif_tx_lock(struct net_device *dev);
         int netif_tx_lock_bh(struct net_device *dev);
         void netif_tx_unlock(struct net_device *dev);
         void netif_tx_unlock_bh(struct net_device *dev);
         int netif_tx_trylock(struct net_device *dev);
    

  • The long-deprecated inter_module API has finally been removed altogether.

  • A new kernel API providing access to the "inotify" functionality has been added.

  • The old scsi_request infrastructure has been removed, since there are no longer any in-tree drivers which use it.

  • The include file <linux/usb_input.h> is now <linux/usb/input.h>.

  • The VFS get_sb() filesystem method has a new prototype:

         int (*get_sb)(struct file_system_type fstype, int flags,
                       const char *dev_name, void *data,
    		   struct vfsmount *mnt);
    

    The mnt parameter is new; it allows the filesystem to receive a pointer to the target mount point structure. The mount point should be associated with the superblock in the get_sb() method with a call to:

         int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb);
    

    The return value of get_sb() has also been changed to an int error status. The various get_sb_*() convenience functions have had the same changes applied. The purpose of all this work is to allow NFS to share superblocks across mount points.

  • The statfs() superblock operation has a new prototype:

         int (*statfs)(struct dentry *dentry, struct kstatfs *stats);
    

    The old struct super_block pointer is now a dentry pointer instead.

  • Some functions have been added to make it easy for kernel code to allocate a buffer with vmalloc() and map it into user space. They are:

         void *vmalloc_user(unsigned long size);
         void *vmalloc_32_user(unsigned long size);
         int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
                                 unsigned long pgoff);
    

    The first two functions are a form of vmalloc() which obtain memory intended to be mapped into user space; among other things, they zero the entire range to avoid leaking data. vmalloc_32_user() allocates low memory only. A call to remap_vmalloc_range() will complete the job; it will refuse, however, to remap memory which has not been allocated with one of the two functions above.

  • The read-copy-update API is now accessible only to GPL-licensed modules. The deprecated function synchronize_kernel() has also been removed.

  • There is a new strstrip() library function which removes leading and trailing white space from a string.

  • A new WARN_ON_ONCE macro will test a condition and complain if that condition evaluates true - but only once per boot.

  • A number of crypto API changes have been merged, the biggest being a change to most algorithm-specific functions to take a pointer to the crypto_tfm structure, rather than the old "context" pointer. This change was necessary to support parameterized algorithms.

  • There is a new make target "headers_install". Its purpose is to install a set of kernel headers useful for libraries and user-space tools. A limited set of headers is installed, and those headers are sanitized on their way to the destination directory. It is hoped that distributors will use this mechanism to set up kernel headers for inclusion from user space in the future.

As of this writing, the 2.6.18 merge window has closed, so there probably will not be a whole lot of additions to the above list.


(Log in to post comments)

Time for production kernels to enable TCP modules

Posted Jul 7, 2006 1:22 UTC (Fri) by gdt (subscriber, #6284) [Link]

It would be nice if the "enterprise" distibrutions could start enabling TCP algorithm modules. The major decisions about TCP are made by the transmitter of the data. That is, by the server, which is the usual target of enterprise distributions.

At the moment those servers can't choose anything but the default TCP algorithm; even if the servers are on high speed networks (where they might choose, say, HTCP), or serving customers who are mainly on wireless networks (Westwood TCP), or are doing large numbers of bulk file transfers (TCP-LP).

Looking forward to 2.6.18

Posted Jul 7, 2006 5:58 UTC (Fri) by jonabbey (subscriber, #2736) [Link]

Hm, I thought that there were some linux audit improvements due as well in 2.6.18.

I assume this is list is just those features that have been incorporated since 2.6.18-rc1?

Looking forward to 2.6.18

Posted Jul 8, 2006 8:20 UTC (Sat) by dale77 (guest, #1490) [Link]

"The read-copy-update API is now accessible only to GPL-licensed modules.
The deprecated function synchronize_kernel() has also been removed."

What is this about? Why only allow code with a certain licence to access a
certain API?

Just curious, but it sounds a bit puritanical to me. Is there a good
practical reason?

Looking forward to 2.6.18

Posted Jul 8, 2006 20:48 UTC (Sat) by rahulsundaram (subscriber, #21946) [Link]

The RCU code has apparently IBM patents which are licensed to be compatible under GPL only. Code licensed under other license would violate the patents. That seems a good enough reason to do this.

Looking forward to 2.6.18

Posted Jul 8, 2006 18:03 UTC (Sat) by afalko (subscriber, #37028) [Link]

The CFQ scheduler set to default? Why would they do that when Anticipatory is much faster than all the rest (as for as I know)? Perhaps they want to test the new NCQ features?

I am very happy that NCQ is now supported on SATA drives. I am actually surprised that my disks have been working very fast, and I would never have even guessed that NCQ was not even supported.

Looking forward to 2.6.18

Posted Jul 8, 2006 20:50 UTC (Sat) by rahulsundaram (subscriber, #21946) [Link]

RHEL 4 and several Fedora versions has been shipping with CFQ as a default.

The rationale is explained in
http://www.redhat.com/magazine/008jun05/features/schedulers/

There are other distributions that have evaluating or changing the default to CFQ too.

Looking forward to 2.6.18

Posted Jul 8, 2006 22:46 UTC (Sat) by zlynx (subscriber, #2285) [Link]

CFQ picked up some anticipatory features at some point.

It also has some nice things that I don't believe AS supports, such as I/O priorities. Using CFQ you can use ionice to set the priority of requests sent by particular processes. It defaults to the regular nice process priority. I/O priority is pretty cool for programs like Beagle (low priority) and multimedia players (high priority).

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds