LWN.net Logo

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch is 2.6.22-rc6, released by Linus on June 24. "I'm happy to say that things seem to have calmed down after -rc5, and that most of this really is just bugfixes and regression fixing in particular." This kernel development cycle would appear to be getting closer to its conclusion; the list of known regressions is getting short. As always, the long-format changelog has lots of details.

About 30 patches have been merged into the mainline git repository since the 2.6.22-rc6 release; they are fixes, mostly in the architecture-specific and USB code.

There have been no -mm releases over the last week, and no releases of any stable kernel trees.

Comments (none posted)

Kernel development news

Quotes of the week

Quite frankly, I personally am considering removing "checkpatch.pl". That thing is just a nazi dream. That hard-coded 80-character limit etc is just bad taste.
-- Linus Torvalds

The problem IMO is that we are seeing less and less patch review but it needs to be more and more. Andrew is one of a handful of people who are reviewing lots of patches. It shouldn't be his wheelbarrow to have to push around all the time. So if a little automation can help Andrew, that's a good thing. Until people revolt, that is.
-- Randy Dunlap

Comments (none posted)

Eliminating tasklets

Tasklets are a deferred-execution method used within the kernel; they were added in the 2.3 development series as a way for interrupt handlers to schedule work to be done in the very near future. Essentially, a tasklet is a function to be called (with a data pointer) in a software interrupt as soon as the kernel is able to do so. In practice, a tasklet which is scheduled will (probably) be executed when the kernel either (1) finishes running an interrupt handler, or (2) returns to user space. Since tasklets run in software interrupt mode, they must be atomic - no sleeping, references to user space, etc. So the work that can be done in tasklets is limited, but they are still heavily used within the kernel.

There is another problem with tasklets: since they run as software interrupts, they have a higher priority than any process on the system. Tasklets can, thus, create unbounded latencies - something which the low-latency developers have been long working to eliminate. Some efforts have been made to mitigate this problem; if the kernel has a hard time keeping up with software interrupts it will eventually dump them into the ksoftirqd process and let them fight it out in the scheduler. Specific tasklets which have been shown to create latency problems - the RCU callback handler, for example - have been made to behave better. And the realtime tree pushes all software interrupt handling into separate processes which can be scheduled (and preempted) like anything else.

Recently, Steven Rostedt came up with a different approach: why not just get rid of tasklets altogether? Since the development of tasklets, the kernel has acquired other, more flexible ways of deferring work; in particular, workqueues function much like tasklets, but without many of the disadvantages of tasklets. Since workqueues use dedicated worker processes, they can be preempted and do not present the same latency problems as tasklets; as a bonus, they provide a process context which allows work functions to sleep if need be. Workqueues, argues Steven, are sufficiently capable that there is no need for tasklets anymore.

So Steven's patch cleans up the interface in a few ways, and turns the RCU tasklet into a separate software interrupt outside of the tasklet mechanism. Then the tasklet code is torn out and replaced with a wrapper interface which conceals a workqueue underneath. The end result is a tasklet-free kernel without the need to rewrite all of the code which uses tasklets.

There is little opposition to the idea of eliminating tasklets, though it is clear that quite a bit of performance testing will be required before such a change could go into the mainline kernel. But almost nobody likes the wrapper interface; it is just the sort of compatibility glue that the "no stable internal API" policy tries to avoid. So there is a lot of pressure to dump the wrapper and simply convert all tasklet users directly to workqueues. Needless to say, this is a rather larger job; it's not surprising that somebody might be tempted to try to avoid it. In any case, the current patch is good for testing; if the replacement of tasklets will cause trouble, this patch should turn it up before anybody has gone to the trouble of converting all the tasklet users.

Another question needs to be answered here, though: does the conversion of tasklets to workqueues lead to a better interrupt handling path, or should wider changes be considered? Rather than doing a context switch into a workqueue process, the system might get better performance by simply running the interrupt handler as a thread as well. As it happens, the realtime tree has long done exactly that: all (OK, almost all) interrupt handlers run in their own threads. The realtime developers have plans to merge this work within the next few kernel cycles.

Under the current plans, threaded interrupt handlers would probably be a configuration-time option. But if developers knew that interrupt handlers would run in process context, they could simply do the necessary processing in the handler and do away with deferred work mechanisms altogether. This approach might not work in every driver - for some devices, it might risk adding unacceptable interrupt response latency - but, in many cases, it has the potential to simplify and streamline the situation considerably. The code would not just be simpler - it might just perform better as well.

Either way, the removal of tasklets would appear to be in the works. As a step in that direction, Ingo Molnar is looking for potential performance problems:

So how about the following, different approach: anyone who has a tasklet in any performance-sensitive codepath, please yell now. We'll also do a proactive search for such places. We can convert those places to softirqs, or move them back into hardirq context. Once this is done - and i doubt it will go beyond 1-2 places - we can just mass-convert the other 110 places to the lame but compatible solution of doing them in a global thread context.

This is a fairly clear call to action for anybody who is concerned about the possible performance impact of this change on any particular part of the kernel. If you think some code needs faster deferred work response than a workqueue-based mechanism can provide, now is not the time to defer the work of responding to this request.

Comments (7 posted)

Linux security non-modules and AppArmor

Long-time LWN readers will know that the Linux security module (LSM) API is controversial at best. To many, it has failed in its purpose, which is enabling the development of competing approaches to hardened Linux system; the only significant in-tree security module remains SELinux. Meanwhile, the LSM interface is easily abused; since it allows the insertion of hooks into almost any system operation of interest, it can be used by other modules to provide non-security functionality. The LSM symbols are mostly exported GPL-only, but it is still possible for binary-only modules to abuse the LSM operations - and, apparently, some have done so.

SELinux hacker James Morris has been pondering this issue recently; he has also noticed that the in-tree security modules (SELinux and the small module implementing capabilities) cannot be unloaded. So, he asked, why implement a modular interface at all? He has posted a patch which turns LSM into a static API with no exported symbols. With this patch applied, any needed security "modules" must be built into the kernel; there is no longer any way to add them at run time.

There have been a few complaints, but, from your editor's point of view, it does not seem like anybody has come up with a compelling reason why it must be possible to unload security modules. Instead, it has been pointed out that maintaining a coherent security state in the presence of unloadable modules is nearly impossible. So this patch would appear to have reasonably good chances of being applied. The only question, perhaps, is whether the developers feel the need to provide an extended warning period for developers and users of out-of-tree security modules.

One such module is AppArmor - the GPL-licensed security mechanism distributed by Novell. AppArmor has remained out of the tree for a long time while its developers have tried to address the various comments which have been posted over the years. A new AppArmor patch has been posted; many things have been fixed, but one of the core points remains: AppArmor still uses a pathname-based mechanism for its policy enforcement. This approach sits poorly with developers - especially those in the SELinux camp - who think that pathnames are an inherently insecure method. In their view, the only truly secure way to control access to objects is to put labels on the objects themselves.

It seemed that this dispute had been resolved at the 2006 kernel summit, where it was determined that the use of pathnames was not enough to keep AppArmor out of the kernel. That has not stopped people from complaining, though, and those complaints redoubled when another pathname-based approach (TOMOYO Linux) was posted recently. If AppArmor does get into the mainline, it will have to be over the objections of developers who feel that is providing false security to its users.

Andrew Morton appears to want to resolve this issue and get it off the mailing lists; he sees two alternatives:

a) set aside the technical issues and grudgingly merge this stuff as a service to Suse and to their users (both of which entities are very important to us) and leave it all as an object lesson in how-not-to-develop-kernel-features. [...]

b) leave it out and require that Suse wear the permanent cost and quality impact of maintaining it out-of-tree. It will still be an object lesson in how-not-to-develop-kernel-features.

It seems that Andrew would rather not be in the position of delivering object lessons on how not to develop kernel code by whatever means; he concludes with this request:

Sigh. Please don't put us in this position again. Get stuff upstream before shipping it to customers, OK? It ain't rocket science.

At the 2006 summit, Linus took a clear position that the use of pathnames for security policies seemed reasonable to him. Given that, along with the fact that AppArmor is being widely distributed, and it seems that, sooner or later, this module should find a home in the mainline - even if it is no longer in modular form.

Comments (36 posted)

A summary of 2.6.22 internal API changes

The 2.6.22 development cycle is slowly heading toward its conclusion, meaning that it should be safe to try to list the significant internal API changes made this time around. They include:

  • The mac80211 (formerly "Devicescape") wireless stack has been merged, creating a whole new API for the creation of wireless drivers, especially those requiring software MAC support.

  • The eth_type_trans() function now sets the skb->dev field, consistent with how similar functions for other link types operate. As a result, many Ethernet drivers have been changed to remove the (now) redundant assignment.

  • The header fields in the sk_buff structure have been renamed and are no longer unions. Networking code and drivers can now just use skb->transport_header, skb->network_header, and skb->skb_mac_header. There are new functions for finding specific headers within packets: tcp_hdr(), udp_hdr(), ipip_hdr(), and ipipv6_hdr().

  • Also in the networking area: the packet scheduler has been reworked to use ktime values rather than jiffies.

  • The i2c layer has seen significant new changes meant to make i2c drivers look more like drivers for other buses. There are, for example, new probe() and remove() methods for notifying devices when i2c peripherals come and go. Since i2c is not a self-describing bus, the support code still needs help to know where i2c devices might be; for many classes of device, this information can be had from the system BIOS.

  • The crypto API has a new set of functions for use with asynchronous block ciphers. There is also a new cryptd kernel thread which can run any synchronous cipher in an asynchronous mode.

  • The subsystem structure has been removed from the Linux device model; there never really was any need for it. Most code which was expecting a struct subsystem argument has been changed to use the relevant kset instead.

  • There is a new version of the in-kernel rpcbind (portmapper) client which supports versions 2-4 of the rpcbind protocol. The portmapper API has changed as a result.

  • Numerous changes to the paravirt_ops methods have been made. Additionally, paravirt_ops is no longer a GPL-only export.

  • There is a new memory function:

        void *krealloc(const void *p, size_t new_size, gfp_t flags);
    

    As one would expect, it changes the size of the allocated memory, moving it if need be.

  • The SLUB allocator has been merged as an experimental (for now) alternative to the slab code. The SLUB API generally matches slab, but the handling of zero-length allocations has changed somewhat.

  • A new macro has been added to make the creation of slab caches easier:

        struct kmem_cache KMEM_CACHE(struct-type, flags);
    
    The result is the creation of a cache holding objects of the given struct_type, named after that type, and with the additional slab flags (if any).

  • The SLAB_DEBUG_INITIAL flag has been removed, along with the associated SLAB_CTOR_VERIFY flag passed to constructors. The result is a set of changes which ripples through quite a few source files. The unused SLAB_CTOR_ATOMIC flag is also gone.

  • The SuperH architecture has working kgdb support again.

  • The ia64 architecture has a new tool which will inject machine check errors into a running system. Not recommended for production machines.

  • The deferrable timers patch has been merged. There is also a new macro for initializing workqueue entries (INIT_DELAYED_WORK_DEFERRABLE()) which causes the job to be queued in a deferrable manner.

  • The old SA_* interrupt flags have not been removed as originally scheduled, but their use will now generate warnings at compile time.

  • There is a new list_first_entry() macro which, surprisingly, gets the first entry from a list.

  • The atomic64_t and local_t types are now fully supported on a wider set of architectures.

  • Workqueues have been reworked again. There is a new function:

        void cancel_work_sync(struct work_struct *work);
    

    This function tries to cancel a single workqueue entry, be it on the shared (keventd) or a private workqueue. Meanwhile run_scheduled_work() has been removed.

The LWN 2.6 API changes page is an ongoing list of API changes in the 2.6 development series.

Comments (none posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

  • Nick Piggin: fsblock. (June 24, 2007)

Networking

Architecture-specific

Security-related

Virtualization and containers

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds