LWN.net Logo

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch is 2.6.21-rc5, released on March 25. It contains a number of fixes, including a set for timer-related regressions. Says Linus: "Those timer changes ended up much more painful than anybody wished for, but big thanks to Thomas Gleixner for being on it like a weasel on a dead rat, and the regression list has kept shrinking." See the long-format changelog for the details.

Several dozen fixes have been merged into the mainline git repository since -rc5 was released.

The current -mm tree is 2.6.21-rc5-mm2. Recent changes to -mm include a new lumpy reclaim patch, an updated deadline staircase (formerly RSDL) scheduler, a number of futex enhancements, and the integrity management patch set (see below).

The current stable 2.6 kernel is 2.6.20.4, released on March 23.

For older kernels: 2.6.16.45 was released with several fixes on March 26.

In the 2.4 world, 2.4.34.2 was released on March 24; it only contains two changes. 2.4.35-pre2 is also out with a rather larger set of fixes.

Comments (none posted)

Kernel development news

Quotes of the week

Anyway, if it doesn't fix a bug it is nowhere near a high-priority patch for that seething bugfest which we like to call a kernel, so I'll drop it.
-- Andrew Morton

In [the] future, I'd recommend adding a witty comment to any such trivial patch: it's really the only way to get it featured on LWN's Kernel Quote of the Week.
-- Rusty Russell

In talking with a lot of different companies recently, I've come to the realization that we really need to do something about companies that violate the kernel's GPLv2 license. It has been a common criticism that "Well, our company abides by the GPL by releasing the code properly for our kernel modules, but what about all of those other companies that do not?" The companies that are good members of the community are getting a lot of pressure by people internal to them to stop releasing the code. This is justified by pointing to the companies that do not release their code as they are not having any "penalties" by doing this.
-- Greg Kroah-Hartman.

Comments (10 posted)

Application-friendly kernel interfaces

The "hugetlb" feature of the kernel allows applications to create and use "huge" pages in memory. These pages use a special page table mode which allows a single page table entry to provide the translation for up to 16MB of contiguous memory (on some architectures). The advantage to doing things this way is that references to the entire huge page only take up one slot in the translation lookaside buffer (TLB), and that can have good effects on performance.

Access to huge pages is through the hugetlbfs filesystem. Hugetlbfs is a virtual filesystem much like tmpfs, but with a twist: mappings of files within the filesystem use huge pages. It's not possible to do normal reads and writes from this filesystem, but it is possible to create a file, extend it, and use mmap() to map it into virtual memory. This interface gets the job done, but it's evidently a little too involved for some application programmers.

To make life simpler, Ken Chen has proposed /dev/hugetlb. This device is much like /dev/zero, except that it uses huge pages. Applications can simply open the device and use mmap() to create as much huge-paged anonymous memory as they need. The patch is simple and seemingly uncontroversial; Andrew Morton did note, though:

afaict the whole reason for this work is to provide a quick-n-easy way to get private mappings of hugetlb pages. With the emphasis on quick-n-easy.

We can do the same with hugetlbfs, but that involves (horror) "fuss".

The way to avoid "fuss" is of course to do it once, do it properly then stick it in a library which everyone uses.

He goes on to observe, however, that getting yet another library distributed widely can be a difficult task - to the point that it's easier to just add more functionality within the kernel itself. He concludes: "This comes up regularly, and it's pretty sad."

In a separate message, Andrew talked about how kernel interfaces should be designed in general:

The fact that a kernel interface is "hard to use" really shouldn't be an issue for us, because that hardness can be addressed in libraries. Kernel interfaces should be good, and complete, and maintainable, and etcetera. If that means that they end up hard to use, well, that's not necessarily a bad thing. I'm not sure that in all cases we want to be optimising for ease-of-use just because libraries-are-hard.

In many cases, the C library fills this role by providing a more application-friendly interface to kernel calls. But there are limits to how much code even the glibc developers want to stuff into the library, and things like a friendlier huge page interface may be on the wrong side of the line. A separate library for developers trying to do obscure and advanced things with the kernel might be the right solution.

The right solution, Andrew suggests, is to have a user-space API library which is maintained as part of the kernel itself. That would keep oversight over the API and help to ensure that the library is maintained into the future while minimizing the amount of code which goes into the kernel solely for the purpose of creating friendlier interfaces. Somebody would have to step up to create and maintain that library, though; as of this writing, volunteers are in short supply.

Comments (7 posted)

Deferrable timers

The dynamic tick code featured in the upcoming 2.6.21 kernel seeks to avoid processor wakeups by turning off the period timer tick when nothing is happening. Before stopping the clock, the kernel must decide when it should wake up again; this decision involves looking at the timer queue to see when the next timer expires. In the absence of other events (hardware interrupts, for example), the system will sleep until the nearest timer is due.

Many of these timers should, in fact, run as soon as the requested period has expired. Others, however, are less important - to the point that they are not worth waking up the processor. These non-critical timeouts can run some fraction of a second later (when the processor wakes up for other reasons) and nobody will notice the difference. So it would be nice if there were a way to tell the kernel that a specific timer does not require immediate action on expiration and that the processor should not wake up for the sole purpose of handling it.

Venki Pallipadi has created such a way with the deferrable timers patch. There is just one new function added to the internal kernel API:

    void init_timer_deferrable(struct timer_list *timer);

Timers which are initialized in this fashion will be recognized as deferrable by the kernel. They will not be considered when the kernel makes its "when should the next timer interrupt be?" decision. When the system is busy these timers will fire at the scheduled time. When things are idle, instead, they will simply wait until something more important wakes up the processor.

Venki appears to have gone to great length to minimize the changes required by this patch. So, in particular, the timer_list structure does not change at all. Instead, the low-order bit on an internal pointer (which is known to always be zero) is repurposed as a "deferrable" flag. The result is that the timer_list structure does not grow to support this new functionality, at the cost of requiring all code using the internal base pointer to mask out the "deferrable" bit.

The patch, as presented, only affects timers used within the kernel; no code has been changed to actually use deferrable timers yet. There could be potential in extending this interface somehow to user space. Our user space remains full of applications which feel the need to wake up frequently to check the state of the world; these applications are a real problem for power-limited systems. If those applications truly cannot be fixed, perhaps they could at least indicate a willingness to wait when nothing important is going on.

Comments (2 posted)

Integrity management in the kernel

Certain patches seem to pop up occasionally on the kernel lists for years. One of those is the whole integrity management patch set from IBM; these patches were last covered here in November, 2005. They are back for consideration yet again. Integrity management still looks like it is not ready for inclusion into the mainline, but it is getting closer; at some point it will force consideration of some interesting questions.

The core idea behind integrity management is providing some sort of assurance that the files on the system have not been messed with. David Safford described it this way:

[B]asically this integrity provider is designed to complement mandatory access control systems like selinux and slim. Such systems can protect a running system against on-line attacks, but do not protect against off-line attacks (booting Knoppix and changing executables or their selinux labels), or against attacks which find weaknesses in the kernel or the LSM module itself.

The current patches work, at the lowest level, by defining a new set of security module hooks for an "integrity provider." The provider can hook into system calls which access or execute files and check the integrity of those files; should it conclude that Bad Things have happened, access to the files can be denied. On top of that is the EVM ("extended verification module") code, which checks the integrity of files (and their metadata) by checksumming them and comparing the result with a value stored as an extended attribute. The IBAC (integrity-based access control) module can then use EVM and the LSM hooks to allow or deny access to files based on the conclusions reached by the integrity checker.

All of this can work using a passphrase supplied by the system administrator, but the intended mode of operation uses the trusted platform module (TPM) built into an increasing number of computers. With cooperation from the system's BIOS, the TPM can do an effective job of checksumming the software running on the system. The TPM also performs basic cryptographic functions, like signing the checksums used to verify the integrity of files. The key aspect of the system, though, is that the TPM can be set up to create these signatures only if the checksums for the running system match a set of pre-configured values. The end result is that the checksums associated with files cannot be changed on another system or by booting a different kernel - at least, not in a way which preserves their value as checksums. If the system holds together as advertised, it should be able to prevent attacks based on changing the files used by the system.

Beyond that, this system supports remote attestation: providing a TPM-signed checksum to a third party which proves that only approved software is running on the system.

There are clear advantages to a structure like this. A Linux-based teller machine, say, or a voting machine could ensure that it has not been compromised and prove its integrity to the network. Administrators in charge of web servers can use the integrity code in similar ways. In general, integrity management can be a powerful tool for people who want to be sure that the systems they own (or manage) have not be reconfigured into spam servers when they weren't looking.

The other side of this coin is that integrity management can be a powerful tool for those who wish to maintain control over systems they do not own. Should it be merged, the kernel will come with the tools needed to create a locked-down system out of the box. As these modules get closer to mainline confusion, we may begin to see more people getting worried about them. Quite a few kernel developers may oppose license terms intended to prevent "tivoization," but that doesn't mean they want to actively support that sort of use of their software. Certainly it would be harder to argue against the shipping of locked-down, Linux-based gadgets when the kernel, itself, provides the lockdown tools.

For now, that issue can be avoided; there are still plenty of more mundane problems with this patch set. But, sooner or later, the integrity management developers are going to get past the lower-level issues; they have certainly shown persistence in working on this patch. Based on his prior statements, Linus is unlikely to oppose the merging of these modules once they are ready. Whether the rest of the development community will be so welcoming remains to be seen.

Comments (6 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Janitorial

Memory management

Networking

Architecture-specific

Security-related

Virtualization and containers

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds