LWN.net Logo

Kernel development

Brief items

Kernel release status

The 3.12 merge window is still open, so there is no development kernel as of this writing.

Stable updates: 3.10.11, 3.4.61, and 3.0.95 were all released on September 7; 3.2.51 came out on September 11.

Comments (none posted)

Quotes of the week

Dropping the spinlocks means more cores; unfortunately, a quad-core seems to be the limit. Users must divide their time between reading history and contributing to the present: some amount of persistent data is a must on every user's machine. Pixel seems to be heading in the wrong direction: that's what is stressing us out.
— Somebody seems to have unleashed a robot on linux-kernel.

Let's see if I can remember the candidates...

	rcu_is_cpu_idle() # reversed sense from the others
	rcu_is_ignored() # reversed sense from the others
	rcu_is_not_active() # reversed sense from the others
	rcu_is_watching_cpu()
	rcu_read_check()
	rcu_is_active()
	rcu_is_active_local()
	rcu_is_online()
	rcu_is_watching_task()
	rcu_is_watching_thread()
	rcu_is_watching_you()
	all_your_base_are_belong_to_rcu()
	rcu_is_active_loco()
	rcu_kilroy_was_here()

Maybe I should just lock them all in a room overnight and see which are still alive in the morning.

Paul McKenney struggles with naming

Comments (1 posted)

Kernel development news

3.12 merge window, part 2

By Jonathan Corbet
September 11, 2013
As of this writing, nearly 8,500 non-merge changesets have been pulled into the mainline repository for the 3.12 development cycle; almost 5,000 of those have been pulled since last week's summary. The process was slowed somewhat when Linus's primary disk drive failed, but not even hardware failure can stop the kernel process for long.

This development cycle continues to feature a large range of internal improvements and relatively few exciting new features. Some of the user-visible changes that have been merged include:

  • The direct rendering graphics layer has gained the concept of "render nodes," which separate the rendering of graphics from modesetting and other display control; the "big three" graphics drivers all support this concept. See this post from David Herrmann for more information on where this work is going.

  • The netfilter subsystem supports a new "SYNPROXY" target that simulates connection establishment on one side of the firewall before actually establishing the connection on the other. It can be thought of as a way of implementing SYN cookies at the perimeter, preventing spurious connection attempts from traversing the firewall.

  • The TSO sizing patches and FQ scheduler have been merged. TSO sizing helps to eliminate bursty traffic when TCP segmentation offload is being used, while FQ provides a simple fair-queuing discipline for traffic transiting through the system.

  • The ext3 filesystem has a new journal_path= mount option that allows the specification of an external journal's location using a device path name.

  • The Tile architecture has gained support for ftrace, kprobes, and full kernel preemption. Also, support for the old TILE64 CPU has been removed.

  • The xfs filesystem is finally able to support user namespaces. The addition of this support should make it easier for distributors to enable the user namespace feature, should they feel at ease with the security implications of such a move.

  • Mainline support for ARM "big.LITTLE" systems is getting closer; 3.12 will include a new cpuidle driver that builds on the multi-cluster power management patches to provide CPU idle support on big.LITTLE systems.

  • The MD RAID5 implementation is now multithreaded, increasing its maximum I/O rates when dealing with fast drives.

  • The device mapper has a new statistics module that can track I/O activity over a range of blocks on a DM device. See Documentation/device-mapper/statistics.txt for details.

  • The device tree code now feeds the entire flattened device tree text into the random number pool in an attempt to increase the amount of entropy available at early boot. It is not clear at this point how much benefit is gained, since device trees are mostly or entirely identical for a given class of device. It is possible for a device tree to hold unique data — network MAC addresses, for example — but that is not guaranteed, and some developers think that entropy would be better served by just feeding the unique data directly.

  • New hardware support includes:

    • Systems and processors: Freescale P1023 RDB and C293PCIE boards.

    • Graphics: Qualcomm MSM/Snapdragon GPUs. The nouveau graphics driver has also gained proper power management support, and the power management support for Radeon devices has been improved and extended to a wider range of chips.

    • Miscellaneous: GPIO-controlled backlights, Sanyo LV5207LP backlight controllers, Rohm BD6107 backlight controllers, IdeaPad laptop slidebars, Toumaz Xenif TZ1090 GPIO controllers, Kontron ETX/COMexpress GPIO controllers, Fintek F71882FG and F71889F GPIO controllers, Dialog Semiconductor DA9063 PMICs, Samsung S2MPS11 crystal oscillator clocks, Hisilicon K3 DMA controllers, Renesas R-Car HPB DMA controllers, and TI BQ24190 and TWL4030 battery charger controllers.

    • Networking: MOXA ART (RTL8201CP) Ethernet interfaces, Solarflare SFC9100 interfaces, and CoreChip-sz SR9700-based Ethernet devices.

    • Video4Linux: Renesas VSP1 video processing engines, Renesas R-Car video input devices, Mirics MSi3101 software-defined radio dongles (the first SDR device supported by the mainline kernel), Syntek STK1135 USB cameras, Analog Devices ADV7842 video decoders, and Analog Devices ADV7511 video encoders.

Changes visible to kernel developers include:

  • The GEM and TTM memory managers within the graphics subsystem are now using a unified subsystem for the management of virtual memory areas, eliminating some duplicated functionality.

  • The new lockref mechanism can now mark a reference-counted item as being "dead." The separate state is needed because lockrefs can be used in places (like the dentry cache) where an item can have a reference count of zero and still be alive and usable. Once the structure has been marked as dead, though, the reference count cannot be incremented and the structure cannot be used.

The closing of the merge window still looks to happen on September 15, or, perhaps, one day later to allow Linus to get back up to speed after his planned weekend diving experience.

Comments (7 posted)

Opening up kernel security bug handling

By Jake Edge
September 11, 2013

The reporting and handling of security issues is a tricky proposition. There are numerous competing interests to try to balance, and a general tendency toward secrecy that can complicate things further. Thus it is not surprising that kernel developers are discussing security handling on the Kernel Summit discussion mailing list (ksummit-2013-discuss). It seems likely that discussion will pick up again at the summit itself, which will be held in Edinburgh, October 23-25.

James Bottomley kicked off the discussion by noting that several recent fixes had gone into the kernel without following the normal process because they were "security fixes". Given that some of those fixes caused problems of various sorts, he is concerned about circumventing the process simply because the patches fix security issues:

In both cases we had commits with cryptic messages, little explanation and practically no review all in the name of security.

Our core processes for accepting code require transparency, review and testing. Secrecy in getting code into the kernel is therefore fundamentally breaking this and risking the kinds of problems we see in each of the instances.

Bottomley would like to explore whether security vulnerabilities need to be handled in secret at all. Given that he thinks that may not be popular, looking into what can be done to inject more transparency into the process would be a reasonable alternative. Part of his theory is that "security people" who "love secrecy" are running the vulnerability-handling process.

For example, the closed kernel security mailing list (security@kernel.org) is either made up of "security officers" (according to Documentation/SecurityBugs) or "'normal' kernel developers" (according to Greg Kroah-Hartman). There is no inherent interest in secrecy by the participants on that list, Kroah-Hartman said, though he did agree that posting a list of the members of security@kernel.org—which has not yet happened—would help to make things more transparent. The relationship between the kernel security list and the linux-distros mailing list (a closed list for distribution security concerns—the successor to vendor-sec) is also a bit murky, which could use some clearing up, Bottomley said.

A big part of the problem is that there are a few different constituencies to try to satisfy, including distributions (some of which, like enterprise distributions, may have additional needs or wants), users (most of whom get their kernel from a distributor or device maker), security researchers (who sometimes like to make a big splash with their findings), and so on. While it might be tempting to dismiss the security researchers as perpetrators of what Linus Torvalds likes to call "the security circus", it is important to include them. They are often the ones who find vulnerabilities; annoying them often results in them failing to report what they find, sadly.

Secrecy in vulnerability handling may be important to the enterprise distributions for other reasons, as Stephen Hemminger said. Security vulnerabilities and response time are often used as a "sales" tool in those markets, so that may lead to a push for more secrecy:

It seems to me that the secrecy is more about avoiding sensationalist news reports that might provide FUD to competitors. For the enterprise products this kind of FUD might impact buying decisions and even the financial markets.

Torvalds's practice of hiding the security implications of patches also plays a role here. He wants to mask vulnerabilities so that "black hats" cannot easily grep them from commit logs, but as James Morris pointed out, that's not really effective: "The cryptic / silent fixes are really only helping the bad guys. They are watching these commits and doing security analysis on them."

It seems unlikely (though perhaps not completely impossible) that Torvalds would change his mind on the issue, so various ideas on collecting known security information correlated with the commit(s) that fixed them were batted around. Clearly, some information about security implications only comes to light after the commit has been made—sometimes long after—so there is a need to collect it separately in any case.

Kees Cook described some of the information that could be collected, while Andy Lutomirski expanded on the idea by suggesting separate CVE files stored in the kernel tree. The idea seemed fairly popular; others chimed in with suggestions for collaborating with Debian and/or the linux-distros mailing list participants. In a separate sub-thread, Lutomirski created a template for how the information could be stored. Cook concurred and suggested that the files could live under Documentation/CVEs or something similar. It is clear that there is an interest in having more data available on security vulnerabilities and fixes in the kernel, so that could lead to a lively discussion in October.

Some seem to have already started down the path of more openness in the security reporting realm. Lutomirski recently publicly posted a fix that was clearly marked as a security fix from the outset. Cook did much the same with a list of vulnerabilities in the kernel's human interface device (HID) code. Exploiting the HID bugs requires physical access and specialized devices, but that may be part of the threat model for certain users. These aren't the first reports of this kind; others have been made from time to time. In fact, certain subsystems (networking, in particular) essentially never use the closed list and prefer to work on security problems and fixes in the open.

An even more recent example comes from Wannes Rombouts's report of a networking security hole (use after free), which was referred to the netdev mailing list by security@kernel.org. The implications of the bug were not completely clear (either to Rombouts or to Hemminger, who replied), but Ben Hutchings recognized that user namespaces could make the problem more widespread (when and if they are enabled in most kernels anyway). Though it is networking related—thus the referral to netdev, presumably—this is the kind of vulnerability that could have been handled behind closed doors. But because it was posted to an open list, the full implications of the problem were discovered. In addition, for this bug (as well as for Lutomirski's and Cook's bugs), those affected have the ability to find out about the problems and either patch their kernels or otherwise mitigate the problem. And that is another advantage of openness.

Comments (12 posted)

BSD-style securelevel comes to Linux — again

By Jonathan Corbet
September 11, 2013
Most of the hand-wringing over the UEFI secure boot mechanism has long passed; those who want to run Linux on systems with secure boot enabled are, for the most part, able to do so. Things are quiet enough that one might be tempted to believe that the problem is entirely solved. As it happens, though, the core patches that implement the lockdown that some developers think is necessary for proper secure boot support still have not made their way into the mainline. The developer behind that work is still trying to get it merged though; in the process, he has brought back an old idea that was last rejected in 1998.

By Matthew Garrett's reading of the secure boot requirements, a system running in secure boot mode must not allow any user to change the running kernel; not even root is empowered to do so. Just over one year ago, Matthew posted a set of patches that implemented the necessary restrictions. In secure boot mode (as defined by the absence of a new capability called, at that time, CAP_SECURE_FIRMWARE), the kernel would not allow the loading of unsigned kernel modules, direct access to I/O ports or I/O memory, or, most controversially, use of the kexec_load() system call to reboot directly into a new kernel. As one might expect, not everybody liked this type of restriction, which flies in the face of the longstanding Unix tradition of giving root enough rope to shoot itself in the foot.

So there were discussions around various aspects of these patches, but one of the biggest problems only came to light later. It seems that there is a fundamental flaw in the capability model: it is nearly impossible to add new capability bits without risking problems with applications that do not know about the new bits. In particular:

  • Some capability-aware applications work by turning off every capability that they do not think they need. If a new bit is added controlling functionality that such an application uses, it will unknowingly disable a necessary capability and cease to work properly. From the point of view of users of this application, this kind of change constitutes an incompatible ABI change.

  • Other applications work in a blacklist-oriented mode, turning off capabilities that are known not to be needed. In essence, such an application simply sets the capability mask to zero, then sets the bits corresponding to the capabilities it wants. If some sort of security-related functionality is put behind a new bit that is unknown to this kind of application, that application will leave the capability enabled. That, in turn, could make the application insecure.

In this case, the biggest risk is that whitelist-style applications would inadvertently turn off CAP_SECURE_FIRMWARE, essentially putting themselves into secure boot mode even if the system as a whole is not running in that mode. That could cause things to break in mysterious ways. What it comes down to is that, if one is designing a capability-based system, one really must come up with the full list of needed capabilities at the outset. Back in 1998, when capabilities for Linux were being hashed out, nobody had UEFI secure boot in mind. So there is no relevant capability bit available, and adding one now is not really an option.

More recently, Matthew posted a new patch set that eliminates the new capability. Instead, all of the secure boot restrictions were tied to the existing flag controlling whether unsigned kernel modules can be loaded. Matthew's reasoning was that the restriction on module loading exists to prevent the loading of arbitrary code into the running kernel, so it made sense to lock down any other functionality that might make it possible to evade that restriction. Other developers disagreed, though, saying that they needed the ability to restrict module loading while still allowing other functionality — kexec_load() in particular — to be used normally. After some discussion, Matthew backed down and withdrew the patches.

Eventually he came back with what he called his final attempt at providing a kernel lockdown facility that wasn't tied to the secure boot mechanism itself. This time around, we have a new sysfs file at /sys/kernel/security/securelevel that accepts any of three values. If it is set to zero (the default), everything works as it always has, with no new restrictions. Setting it to one invokes "secure mode," in which all of the restrictions related to secure boot go into effect. Secure mode is also irrevocable; once it has been enabled, it cannot be disabled (short of compromising the kernel, at which point the battle is already lost). There is also an interesting "permanently insecure" mode obtained by setting securelevel to -1; the system's behavior is the same as with a setting of zero, but it is no longer possible to change the security level.

In the UEFI secure boot setting, the bootstrap code would take pains to set securelevel to one before allowing any processes to run. That helps to avoid race conditions where the system is subverted before the lockdown can be applied.

Some readers will, by now, have recognized that "securelevel" looks an awful lot like the BSD functionality that goes by the same name; it was clearly patterned after BSD's version. Amusingly, this is not the first time that securelevel has been considered for Linux; there was an extensive discussion on the subject in early 1998, when Alan Cox was pushing strongly for a securelevel feature. At that time, Linus rejected the feature because he had something much better in mind: capabilities. As is usually the case, Linus won out, and Linux got capabilities instead of securelevel.

More than fifteen years later, it seems that we might just end up with both mechanisms. Thus far, Matthew's latest patch set has not resulted in many screams of agony, so it might just pass review this time — though, at this point, it is almost certainly too late for 3.12. Meanwhile, Vivek Goyal has posted the first version of a signed kexec patch set that would limit kexec_load() to signed images. That would allow some useful features (kdump, for example) to continue to work properly in the secure boot environment without leaving kexec_load() completely open. That, too, will make the secure boot restrictions a bit more palatable and increase their chances of being merged.

Comments (35 posted)

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

  • Marco Stornelli: pramfs . (September 9, 2013)

Memory management

Networking

Security-related

Virtualization and containers

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds