LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 3.1-rc3 (code-named "Divemaster Edition"), released on August 22. Linus says:

And a few thank-yous are in order: things are looking good. The diffstat looks reasonable (the one big addition is in Documentation), and while I could have wished for even less churn, I'm pretty happy. The rc2 to rc3 shortlog is appended, and I think it mostly looks pretty reasonable and short. Which is not to say that I'm not hoping that things will calm down even further in the later rc's, but at least so far I don't think I've had much reason to complain.

See the full changelog for all the details.

Stable updates: no stable updates have been released in the last week, and none are in the review process as of this writing.

Comments (none posted)

Quotes of the week

WhoMeNope. That would imply that I understood it, but my brain is far too small to understand rcutree.c - that's what we have paulmcks for.
-- Andrew Morton

Has anybody ever looked at a real computer? Doesn't anybody know how computer math works any more? Doing that as a (slow) double division on 32-bit is so stupid that it's past even just "wrong". It's way off in la-la-land, sitting in a corner, all hopped up on drugs and painting its nails purple.
-- Linus Torvalds

Some organisations disagree with this and say the license has to explicitly be reinstated according to GPLv2 section 4.

I've talked to quite a few lawyers worldwide and they all think that downloading the software will give you a new license, so I wouldn't be too worried about these organisations.

-- Armijn Hemel on the "GPL death penalty"

Comments (13 posted)

Three Linux wireless summit videos

Videos of three talks at the recently concluded Linux wireless summit have been posted. These talks cover the implementation of dynamic frequency selection, 802.11s mesh networking, and mesh network testing with wmediumd.

Full Story (comments: none)

Kernel development news

Merging the kvm tool

By Jonathan Corbet
August 23, 2011
The "native Linux KVM tool" (which we'll call "NLKT") is a hardware emulation system designed to support virtualized guests running under the KVM hypervisor. It offers a number of nice features, but an attempt to get this code merged into the 3.1 kernel was deferred by Linus, who did not want to deal with another controversial development at that time. This tool's developers have let it be known that it will be back for the 3.2 merge window; controversy is sure to follow. The core question raised by this project is: what code is appropriate for the kernel tree, and which projects should live in their own repositories elsewhere?

NLKT was started in response to unhappiness about QEMU, the state of its code, and the pace of its development. It was designed with simplicity in mind; NLKT is meant to be able to boot a basic Linux kernel without the need for a BIOS image or much in the way of legacy hardware emulation. Despite its simplicity, NLKT offers "just works" networking, SMP support, basic graphics support, copy-on-write block device access, host filesystem access with 9P or overlayfs, and more. It has developed quickly and is, arguably, the easiest way to get a Linux kernel running on a virtualized system.

Everybody seems to think that NLKT is a useful tool; nobody objects to its existence. The controversy comes for other reasons, one of which is the name: the tool simply calls itself "kvm." The potential for confusion with the kernel's KVM subsystem is clear - that is why this article made up a different acronym to refer to the tool. "KVM" is already seen as an unfortunate name - searches for the term bring in a lot of information about keyboard-video-mouse switches - so adding more ambiguity seems like a bad move. It is also seemingly viewed by some as a move to be the "official" hardware emulator for KVM. The NLKT developers have, thus far, resisted a name change, though.

The bigger fight is over whether NLKT belongs in the kernel at all. It is not kernel code; it is a program that runs in user space. The question of whether such code should be in the kernel's repository is certainly the one that will decide whether it is merged for 3.2 or not.

NLKT would not be the first user-space tool to go into the mainline kernel; several others can be found in the tools/ directory. Many of them are testing tools used by kernel developers, but not all. The "cpupower" tool was merged for 3.1; it allows an administrator to tweak various CPU power management features. The most actively developed tool in that directory, though, is perf, which has grown by leaps and bounds since being merged into the mainline. The developers working on perf have been very outspoken in their belief that putting the tool into the mainline kernel repository has helped it to advance quickly.

Proponents say that, like perf, NLKT is closely tied to the kernel and heavily used by kernel developers; like perf, it would benefit from being put into the same code repository. KVM, they say, is also under heavy development; having NLKT and KVM in the same tree would help both to improve more quickly. It would bring more review of any future KVM ABI changes, since a user of that ABI would be merged into the kernel as well. Keeping the hardware emulation code near the drivers that code has to work with is said to be beneficial to both sides. All told, they say, perf would not have been nearly as successful outside of the mainline tree as it has been internally; merging NLKT can be expected to encourage the same sort of success.

That success seems to be one of the things that opponents are worried about; some have worried that the main purpose is to increase the project's visibility so that it succeeds at the expense of competing projects. The ABI development benefits are challenged; any changes would clearly still have to work with tools like QEMU regardless of whether NLKT is in the kernel, so QEMU developers would have to remain in the loop. It is even better, some say, to separate the implementation of an ABI from its users; that forces the implementers to put more effort into documenting how the ABI should be used.

There is also concern that, once we start seeing more user-space tools going into the kernel tree, there will be an unstoppable flood of them. Where does it stop, they ask - should we pull in the C library, the GNU tools, or, maybe, LibreOffice? Linux is not BSD, they say; trying to put everything into a single repository is not the right direction to take. The answer to that complaint is that there is no interest in merging arbitrary tools; only those that are truly tied to the kernel would qualify. By this reasoning, NLKT is an easy decision. A C library is something that could be considered; perhaps even graphics if the relevant developers wanted to do that. But office suites are not really eligible; there are limits to what should go into the mainline.

That was where the discussion stood at the beginning of the 3.1 merge window; Linus decided not to pull NLKT at that time. Instead, he clearly wanted the discussion to continue; he told the NLKT developers that they would have to convince him in the 3.2 merge window instead. It looks like that process is about to begin; the NLKT repository is about to be added to linux-next in anticipation of a pull request once the merge window opens. This time, with luck, we'll have a resolution of the issue that gives some guidance for those who would merge other user-space tools in the future.

Comments (24 posted)

The udev tail wags the dog

By Jonathan Corbet
August 24, 2011
It is not unheard of for kernel developers to refuse to support a particular user-space interface that, they think, is poorly designed or hard to maintain into the future. A user-space project refusing to use a kernel-provided interface in the hope of forcing the creation of something better is a rather less common event. That is exactly what is happening with the udev project's approach to device tree information, though; the result could be a rethinking of how that information gets to applications.

OLPC laptops have, among their other quirks, a special keyboard which requires the loading of a specific keymap to operate properly. For the older generations of laptops, loading this keymap has been easily handled with a udev rule:

    ENV{DMI_VENDOR}=="OLPC", ATTR{[dmi/id]product_name}=="XO", \
		RUN+="keymap $name olpc-xo"

This rule simply extracts the name of the machine from the desktop management interface (DMI) data that has been made available in sysfs. If that data indicates that the system is running on an XO laptop, the appropriate keymap file is loaded. DMI is an x86-specific interface, though, and the upcoming (1.75) generation of the XO laptop is not an x86-based machine. There is no DMI information available on that laptop, so this rule fails; some other solution will be needed.

In current times, the source for hardware description information - especially on non-discoverable platforms - is supposed to be the device tree structure. So Paul Fox's solution would seem to make sense: he created a new rule with a helper script to extract the machine identification from the device tree, which happens to be available in /proc/device-tree. It almost certainly came as a surprise when this solution was rejected by udev maintainer Kay Sievers, who said:

Reading such things from /proc is kinda taboo from code we maintain in udev. All things not related to processes really do not belong into /proc and udev code should never get into the way of possibly deprecating these things in the long run, even when they might never happen. I know, there is sometimes no other simple option, but we generally prefer the inconvenience it causes, over adding hacks to upstream code, to make a move to a generally useful solution (which isn't /proc/*) more attractive.

Of course, Paul wasn't adding the /proc/device-tree interface; criticism of such a move would not have been surprising. That file has a long history; it has been supported, under some architectures, since the 2.2 kernel. So one might think that it is a bit late to be complaining about it; there are a number of /proc files added in those days which would not be allowed into /proc now. In general, those files are considered to be part of the user-space ABI at this point; like it or not, we are stuck with them. The device tree file has been around for long enough that it almost certainly falls into that category; it's hard to imagine that it would have been maintained for so long if there were no programs making use of it. Whether or not the udev developers like it, /proc/device-tree is not likely to go anywhere else anytime soon.

That still doesn't mean that the udev developers have to make use of it, though, and they seem determined to hold out for something better. Quoting Kay again:

No, sorry, the time for dirty hacks in userspace, and work-arounds for architectures and platforms that don't provide what is commonly used elsewhere is over. There is no rush, it's new functionality, and no need to start with 'transitions periods' that in reality will never end. Stuff just needs to be fixed properly these days, and papering over just hurts us more in the end.

Kay would like to see the machine identification information exposed separately somewhere under /sys; it has even been suggested that platforms using device trees could emulate the DMI directory found on x86 systems. That, to them, looks like a longer-term solution that doesn't put udev in the position of blocking an ABI cleanup sometime in the future.

In essence, what we have is a user-space utility telling the kernel that an interface it has supported for well over a decade is unacceptable and needs to be replaced. To force that replacement, udev is refusing to accept changes that make use of the existing interface. Whether that is a proper course of action depends on one's perspective. To some, it will look like a petty attempt to force kernel developers to maintain two interfaces with duplicate information in the hope that a long-lived /proc file will eventually go away, despite its long history. To others, it will seem like a straightforward attempt to help the kernel move toward interfaces that are more supportable in the long term.

In this particular case, it looks like udev probably wins. Adding the machine identification somewhere in sysfs will be easy enough that it is probably not worth the effort to fight the battle. In a more general sense, this episode shows that the kernel ABI is not just something handed down to user space from On High. User-space developers will have their say, even a dozen years after the interface has been established; that is a good thing. Having more developers look at these issues from both sides of the boundary can only help in the creation of better interfaces.

Comments (25 posted)

LinuxCon: x86 platform drivers

By Jake Edge
August 24, 2011

With his characteristically dry British humor, Matthew Garrett outlined the current situation with x86 platform drivers at LinuxCon. These drivers are needed to handle various "extra" hardware devices, like special keys, backlight control, extended battery information, fans, and so on. There are a wide range of control mechanisms that hardware vendors use for these devices, and, even when the controller hardware is the same, different vendors will choose different mechanisms to talk to the devices. It is a complicated situation that seems to require humor—and perhaps alcohol—to master.

[Matthew Garrett]

Garrett does a variety of things for Red Hat, including hardware support and firmware interfaces (e.g. for EFI). Mostly he does "stuff that nobody else is really enthusiastic about doing", he said. Platform drivers are "bits of hardware support code" that are required to make all of the different pieces of modern hardware function with Linux. Today's hardware is not the PC of old and it requires code to make things work, especially for mobile devices.

He started by looking at keys, those used to type with, but also those that alter display brightness or turn hardware (e.g. wireless) on and off. The "normal" way that keys have been handled is that a key press causes an interrupt, the kernel reads a value from the keyboard controller, and the keycode gets sent on to user space. The same thing happens for a key up event. This is cutting edge technology from "1843 or something", which is very difficult to get wrong, though some manufacturers still manage to do so. The first thing anyone writes when creating a "toy OS" is the keyboard driver because it is so straightforward.

In contrast to that simple picture, Garrett then described what goes on for getting key event information on a Sony laptop. The description was rather baroque and spanned three separate slides. Essentially, the key causes an ACPI interrupt, which requires the kernel to do a multi-step process executing "general purpose event" (GPE) code in the ACPI firmware, and calling ACPI methods to eventually get a key code that ends up being sent to user space. "This is called value add", he said.

Manufacturers are convinced that you don't want to manage WiFi the same way on multiple devices. Instead, they believe you want to use the "Lenovo wireless manager" (for example) to configure the wireless device. "Some would call them insane", and Garrett is definitely in that camp. The motivation seems to be an opportunity for the device maker to splash their logo onto the screen when the manager program is run. As might be guessed, there is no documentation available because that would allow others to copy the implementation, which obviates the supposed value add.

It is not just keyboards that require platform drivers, Garrett said. Controlling radios, ambient light sensors ("everyone wants the brightness to change when someone walks behind them"), extended battery information (using identical battery controller chips, with the interface implemented differently on each one), hard drive protection (which always use the same accelerometer device), backlight control, CPU temperature, fan control, LEDs (e.g. a "you have mail" indicator, that is "not really useful" but is exposed "for people who don't have anything better to do with their lives"), and more, all need these drivers.

Multiple control mechanisms

There are half-a-dozen different interfaces that these drivers will use to control the hardware, starting with plain ACPI calls. That is generally one of the easiest methods to use, because it is relatively straightforward to read the ACPI tables and generate a driver from that information. Events are sent to the driver, along with an event type, and some reverse engineering is required to work out what the types are and what they do. There are specific ACPI calls to get more information about the event as well. Garrett's example showed two acpi_evaluate_object() calls for the AUSB ("attach USB") and BTPO ("Bluetooth power on") ACPI methods, which is all that is needed to turn on Bluetooth for a Toshiba device. "Wonderful", he said.

A small micro-controller with closed-source firmware—the embedded controller—is another means to control hardware. Ideally, you shouldn't have to touch the embedded controller because ACPI methods are often provided to do so. But, sometimes you need to access the registers of the controller to fiddle with GPIO lines or read sensor data stored there. The problem is that these register locations can and do change between BIOS versions. While it is "considered bad form to write a driver for a specific BIOS version", sometimes you have to do so. It is a fairly fragile interface, he said.

Windows Management Instrumentation (WMI) is a part of the Windows driver model that Microsoft decided would be nice to glue into ACPI. It has methods that are based on globally unique IDs (GUIDs) corresponding to events. A notify handler is registered for a GUID and it gets called when that event happens. The Managed Object Format (MOF) code that comes with a given WMI implementation is supposed to be self-documenting, but there is a problem: it is compressed inside the BIOS using a Microsoft proprietary compression tool "that we don't know how to decompress". As an example of WMI-based driver, Garrett showed a Dell laptop keyboard handling driver that reports the exact same keycode that would have come from a normal keyboard controller, but was routed through WMI instead, "because this is the future", he said.

Drivers might also be required to make direct BIOS calls, which necessitates the use of a real mode int instruction. This is "amazingly fragile" and incompatible with 64-bit processors. Currently, the only time BIOS interrupts are invoked from user space are for X servers and Garrett suggests that drivers should "never do this". In fact, he went further than that: "If you ever find hardware that does this, tell me and I will send you money for new hardware". If you decide to write code that implements this instead, he said that he would pay someone else money to "set fire to your house".

System Management Mode (SMM) traps are yet another way to control hardware, but there seems to be a lot of magic involved. There are "magic addresses" that refer to memory that is hidden from the kernel. In order to use them, a buffer is set up and the address is poked, at which point the "buffer contents magically change". There have been various problems with the SMM implementations from hardware vendors including some HP hardware that would get confused if SMM was invoked from anything other than CPU 0. Garrett did not seem particularly enamored of this technique, likening it to the business plan of the "Underpants Gnomes".

The last control mechanism Garrett mentioned is to use a native driver to access the hardware resources directly. Typically these drivers use ACPI to identify that the hardware exists. The hardware is accessed using the port IO calls (i.e. inb(), outb()), and will use native interrupts to signal events. Various models of Apple hardware uses these kinds of drivers, Garrett said.

Consistent interfaces

While there are many ways to access the hardware, kernel hackers want to provide a consistent interface to these devices. We don't want "to have to install the Sony program to deal with WiFi". So, "hotkeys" are sent through the input system, "keys are keys". Backlight control is done via the backlight class. Radio control is handled with rfkill, thermal and fan state via hwmon, and the LED control using the led class. That way, users are insulated from the underlying details of how their particular hardware implements these functions.

There are two areas that still have inconsistent interfaces, Garrett said. The hard drive protection feature that is meant to park the disk heads when an untoward acceleration is detected (e.g. the laptop is dropped) does not have a consistent kernel interface. Also, the ambient light sensors are lacking an interface. The latter has become something of a running joke in the kernel community, he said, because Linus Torvalds thinks it should be done one way, but the maintainer disagrees, so, as yet, there is no consistent interface.

How do I work this?

Garrett also had some suggestions on figuring out how new/unsupported hardware is wired up. There is a fair amount of reverse engineering that must be done, but the starting point is to use acpidump and acpixtract utilities to find out what is in the ACPI code in the hardware.

If the device is WMI-based, wmidump may also be useful. Extracting the event GUIDs and registering a handler for each will allow one to observe which ones fire for various external events. Then it is a matter of flipping switches to see what happens, parsing the data that is provided with the event, and figuring how to do something useful. This may require alcohol, he said.

For embedded controllers or direct hardware access, there are sysfs files that can be useful. The embedded controller can be accessed via /sys/kernel/debug/ec/ec0/io (at least for those who have debugfs mounted), or by using the ec_access utility. Once again, you need to hit buttons, throw various switches, and listen for fan changes. In addition, you should test that the register offsets are stable for various machine and BIOS version combinations, he said. You can find the IDs of devices to access them directly via the /sys/bus/pnp/devices/*/id files, register as a PNP bus driver for devices of interest, and then "work out how to drive the hardware".

The overall picture that Garrett painted is one of needless complexity and inconsistency that is promulgated by the hardware vendors. But, it is something that needs to be handled so that all of the "extras" baked into today's hardware work reliably—consistently—with Linux. While it would be nice if all of these components were documented in ways that Linux driver writers could use, that doesn't seem likely to change anytime soon. Until then, Garrett and the rest of the kernel community will be wrestling with these devices so that we don't get locked into manufacturer-specific control programs.

[ I would like to thank the Linux Foundation for travel assistance to attend LinuxCon. ]

Comments (14 posted)

Patches and updates

Kernel trees

Core kernel code

Device drivers

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Security-related

Virtualization and containers

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds