Kernel development [LWN.net]

Kernel release status

The current 2.6 kernel is 2.6.3, which was released on February 17. Only a handful of patches have gone in since the last release candidate. Overall, 2.6.3 includes a great deal of internal cleanup work, the removal of the USB scanner driver (in favor of the user-space libusb solution), the new generic DMA pool mechanism, "context mount" support for SELinux, a big ALSA update, a fix for the new mremap() vulnerability, and quite a few architecture updates. See the long-format changelog for the details.

During the last week, we also saw 2.6.3-rc3 (changelog) and 2.6.3-rc4 (changelog).

The current kernel tree from Andrew Morton is 2.6.3-mm1. Recent additions to the -mm tree include some more scheduler improvements, a new CPU hotplug implementation, journaled quotas for the ext3 filesystem, and numerous fixes.

2.6.3-mm1 also contains the new device mapper crypto target code. This target allows the creation of encrypted filesystems by way of the device mapper (LVM) subsystem. If things work out, this approach is likely to replace the (buggy) cryptoloop driver; if you have an interest in encrypted filesystems, testing out this patch might be a good idea.

The current 2.4 kernel is 2.4.25, released by Marcelo on February 18. Among other things, this release includes the mremap() vulnerability fix. Marcelo has had a busy week, having previously released 2.4.25-rc2, -rc3, and -rc4.

Comments (4 posted)

Quote of the week

I suspect most samba developers are already technically insane... Of course, since many of them are Australians, you can't tell.

-- Linus Torvalds

Comments (2 posted)

The kernel and character set encodings

It all started as a JFS bug report. The JFS filesystem, it seems, gets upset when user space passes it file names encoded in the UTF-8 format. Rather than create or open a file with the name as given, it gives up and returns EINVAL. Patches which fix the problem have been posted, but the resulting discussion has taken rather longer to be resolved.

JFS has an "iocharset" option which can be used to state explicitly, at mount time, which character encoding is being used. There were calls on linux-kernel for this option to be added to other filesystems as well. The idea was rather strongly shot down, however, for a few reasons. One of those is that multiple users could be simultaneously using different character encodings on the same filesystem; a global option for the whole filesystem clearly will not be able to address that case.

The real reason, however, is that performing character set conversion requires the kernel to interpret the file name strings being passed to it from user space. The kernel hackers are very resistant to the imposition of any such policy; it would go against decades of Unix tradition. Officially, the kernel has no policy regarding which character set is being used for file names, content, or anything else. In each case, the kernel sees nothing more than a stream of bytes.

That said, the kernel does have some policies regarding file names: they use "/" as a directory delimiter, and they are terminated by a NULL byte. This policy rules out the use of many encodings which are sometimes employed to represent non-ASCII characters; the fixed-width wide encodings all tend to use lots of bytes containing zero. In reality, the only practical choices for representing characters beyond the ASCII set are iso-8859-1 (which allows the representation of characters used in many continental European languages) and UTF-8, which can encode pretty much anything.

UTF-8 is relatively easy to use; for US users it looks just like ASCII, but it can handle a far wider range of characters while not breaking (most) code which uses traditional C strings. Thus it is often said that UTF-8 is the encoding used by the Linux kernel. That statement is a mistake, however: Linux does not use any particular encoding. If user space uses UTF-8 to represent extended characters, everything will work. But nothing forces user space to work in that way.

This approach keeps policy out of the kernel, but some developers are not entirely happy with it. The lack of policy can lead to user-space confusion in a number of ways. For example, if a user creates a file called WéîrdÑàmë, that name could be represented in the filesystem in more than one way. Depending on how user space is configured, it could choose either iso-8859-1 or UTF-8; the encoding of that name will be quite different depending on that choice. A different user space could interpret the file name differently in the future, resulting in unreadable filenames and confused users. The kernel, lacking a character encoding policy of its own, will do nothing to help prevent this situation.

Confusion over character sets can also facilitate the creation of security holes; code which attempts to clean up file names can fail if evil characters are given in an unexpected encoding. Code which expects UTF-8 must also be careful when dealing with the Linux kernel because the kernel itself makes no effort to ensure that any string is, in fact, a legal UTF-8 encoding.

To complicate the situation even more, Andrew Tridgell posted another reason why, he thinks, the kernel will have to adopt a specific character encoding: case insensitivity. Says Tridge:

The reason is that I think that eventually the Linux kernel will need to efficiently support a userspace policy of case-insensitivity and the only way to do case-insensitive filename operations is to interpret those byte streams as a particular encoding.

Needless to say, the idea of implementing case-insensitive filesystem operations in the kernel was not particularly popular. Not too many kernel hackers want to complicate the filesystem code to implement what they see as being a broken Windows feature to begin with. There are other difficulties as well: case-insensitive matching must be done differently in different languages. The end result is that case insensitive lookups are not very likely to make it into the kernel anytime soon.

Linus is not averse to trying to help out Samba and other applications which wish to implement case-insensitive behavior, however. He has proposed a new "magic_open()" interface which would make it easier for user space to perform case-insensitive lookups without actually doing that work in the kernel. This interface would likely require quite a bit of work before it would do what the Samba developers need, but something derived from it could just make an appearance in the 2.7 development series.

Meanwhile, the kernel does not seem likely to adopt any sort of official encoding anytime soon. The problems that result from the lack of an encoding policy are mostly seen as user space issues. Proper locale support is still relatively new in Linux, and many rough edges remain. Given the high level of interest in high-quality localization support in Linux, however, one might expect those edges to be smoothed down quickly.

(For those who would like to learn more about UTF-8, see this FAQ or RFC 3629).

Comments (23 posted)

invalidate_page_range() for non-GPL modules

The kernel function invalidate_page_range() is not something which has a lot of callers. Its job is to invalidate all memory mappings which cover a specific part of a file, presumably because the contents of the relevant pages have changed on disk. This function is currently exported only to GPL-licensed modules.

Paul McKenney has requested that this function be exported to all modules. It seems that IBM's GPFS filesystem needs it, and that filesystem is not free software. The claim is that the filesystem is an entirely independent development, and is thus not derived from the kernel; it should not have to be licensed under the GPL to be loadable into the kernel.

Andrew Morton says he is not opposed to the patch. One might think it would not be too controversial, especially since that function was first created and submitted by...Paul McKenney. There are developers, however, who believe that any module which is digging that deeply into the virtual memory subsystem cannot help but be derived, in some fashion, from the Linux kernel. There is also, perhaps, a certain desire to demonstrate that even IBM can't obtain arbitrary access to the kernel for proprietary modules.

In general, the kernel hackers are more interested in seeing their work be useful and used, instead of fighting over licensing battles. So one might expect to see this patch eventually get incorporated. In more recent times, however, some developers have been adopting a firmer position with regard to proprietary modules. This patch may still get in, but it's likely to have a harder time than would have once been the case.

Comments (2 posted)

No more 24-bit atomic_t

The atomic_t type in the Linux kernel is a simple integer variable with a set of operations which are guaranteed to be atomic without the need for explicit locking. For years, atomic_t variables have operated under the constraint that they can be expected to hold no more than 24 bits; this limitation was forced by the Sparc32 architecture, which used the other eight bits to implement the atomic operations.

As of 2.6.3, this limitation no longer holds. This patch by Keith M Wesolowski has changed the Sparc32 implementation to a version (taken from the PA-RISC architecture) which provides full 32-bit atomic variables.

The new implementation works by creating a small array (four entries) of spinlocks. When an operation is to be performed on an atomic variable, one of those spinlocks is chosen by a hash function; the code holds the given lock while manipulating the variable. The result is proper locking for atomic operations without doubling the size of every atomic_t in the system. The patch was quickly picked up and merged, and kernel programmers have one less strange limitation to worry about.

Comments (3 posted)

Linus Torvalds Linux 2.6.3 ?

Andrew Morton 2.6.3-mm1 ?

Con Kolivas 2.6.3-ck1 ?

Linus Torvalds Linux 2.6.3-rc4 ?

Linus Torvalds Linux 2.6.3-rc3 ?

Andrew Morton 2.6.3-rc3-mm1 ?

Andrew Morton 2.6.3-rc2-mm1 ?

Martin J. Bligh 2.6.2-mjb1 ?

Marcelo Tosatti linux-2.4.25 released ?

Bernhard Rosenkraenzer 2.4.25-pac1 ?

Marcelo Tosatti Linux 2.4.25-rc4 ?

Marcelo Tosatti Linux 2.4.25-rc3 ?

Marcelo Tosatti Linux 2.4.25-rc2 ?

Benjamin Herrenschmidt Update platinumfb driver ?

Santiago Leon IBM PowerPC Virtual Ethernet Driver ?

Keith M Wesolowski RFC: [sparc32] atomic_t is 32 bits ?

Con Kolivas kernbench-0.20 ?

dan carpenter Strace Test ?

Keith Owens Announce: kdb v4.3 is available for kernel 2.6.3-rc3 ?

Keith Owens Announce: kdb v4.3 is available for kernel 2.6.3-rc4 ?

Keith Owens Announce: kdb v4.3 is available for kernel 2.6.3 ?

Pavel Machek kgdb-lite for 2.6.2 ?

Pavel Machek kgdb-lite for 2.6.2 (i386 specific parts) ?

H. Peter Anvin Updated dynamic pty patch available ?

Jeff Garzik 2.6.x libata update ?

Jean Tourrilhes new driver : stir4200 ?

Len Brown ACPI for 2.6 ?

Matthew Wilcox Expanded PCI config space (against 2.6.3-rc4) ?

Michael Frank PATCH, RFC: Version 3 of 2.6 Codingstyle ?

Michael Frank PATCH, RFC: Version 4 of 2.6 Codingstyle ?

Michael Frank PATCH, RFC: Version 5 of 2.6 Codingstyle ?

Jan Kara Journalled quota (fwd) ?

Miklos Szeredi [PATCH] allowing user mounts ?

Christophe Saout dm-crypt using kthread ?

Niraj Kumar [2.6] UFS2 Read Only Patch ?

Karim Yaghmour relayfs patch for 2.6.3 ?

Randy.Dunlap sys_device_[un]register() are not syscalls ?

Randy.Dunlap syscalls.h update #9 (open/close) ?

Randy.Dunlap syscalls.h #10 ?

Valdis.Kletnieks@vt.edu [PATCH} 2.6 and grsecurity ?

James Morris Event notifications via Netlink ?

Con Kolivas 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench ?

Judith Lebzelter OSDL tiobench Sequential Reads improved with AS due to read-ahead changes ?

Mariusz Mazur linux-libc-headers 2.6.2.0 ?

Greg KH udev 017 release ?

Kernel development

Brief items

Kernel release status

Kernel development news

Quote of the week

The kernel and character set encodings

invalidate_page_range() for non-GPL modules

No more 24-bit atomic_t

Patches and updates

Kernel trees

Architecture-specific

Development tools

Device drivers

Documentation

Filesystems and block I/O

Janitorial

Security-related

Benchmarks and bugs

Miscellaneous