|
|
Log in / Subscribe / Register

Kernel development

Brief items

Kernel release status

The current development kernel is 3.14-rc1, released on February 2. Everybody hoping for a π-oriented codename for this release will be disappointed: "I realize that as a number, 3.14 looks familiar to people, and I had naming requests related to that. But that's simply not how the nonsense kernel names work. You can console yourself with the fact that the name doesn't actually show up anywhere, and nobody really cares. So any pi-related name you make up will be *quite* as relevant as the one in the main Makefile, so don't get depressed." Instead, this kernel is named "Shuffling zombie juror."

Stable updates: no stable updates have been released in the last week. As of this writing, the 3.13.2 (140 patches), 3.12.10 (133 patches), 3.10.29 (104 patches) and 3.4.79 (37 patches) updates are in the review process; they can be expected on or after February 6.

The 2.6.34.15 update is also in the review process; it contains 213 patches. Paul Gortmaker writes: "This will be the last release on 2.6.34.x ; people should be making migration plans to newer kernels. As such, the focus here has been with CVE items, data leaks, and bugs that could trigger BUG/oops."

Comments (none posted)

Quotes of the week

Let me also stress that although very exciting, this effort is still experimental, so I would like to make sure that nobody makes excessive expectations based on these few patches. The scope of this work is strictly limited to Tegra (although given the similarities desktop GPU support will certainly benefit from it indirectly), and we do not have any plan to work on user-space support. So do not uninstall that proprietary driver just yet. ;)
— NVIDIA's Alexandre Courbot contributes to Nouveau

Hey, this time I'm raising a thumb for nvidia.
Linus Torvalds

At Korea Linux Forum last fall, Linus asked, haven't I been hearing about Tux3 for ten years? I said, no, that was Tux2, completely different. You only heard about Tux3 for six years.
Daniel Phillips

Comments (2 posted)

kGraft — live kernel patching from SUSE

SUSE has announced the existence of kGraft, a mechanism for applying kernel patches without the need to reboot the system. It is similar to ksplice in functionality, but the implementation appears to be rather different and the developers plan to try to get it merged into the mainline kernel. "kGraft builds on technologies and ideas that are already present in the kernel: ftrace and its mcount-based reserved space in function headers, the INT3/IPI-NMI patching also used in jumplabels, and RCU-like update of code that does not require stopping the kernel. A kGraft patch is a kernel module and fully relies on the in-kernel module loader to link the new code with the kernel. Thanks to all that, the design can be nicely minimalistic." The first code release is planned for March.

Comments (23 posted)

Kernel development news

3.14 Merge window part 3

By Jonathan Corbet
February 5, 2014
By the time Linus closed the merge window and released 3.14-rc1, a total of 10,622 non-merge changesets had been pulled into the mainline kernel repository. That makes this merge window the busiest since 3.10, though it beat 3.13 by a mere 104 patches. At the current rate, 3.10 (11,963 patches pulled during the merge window) is likely to hold its record for some time yet.

Interesting user-visible changes found in the 2000 patches pulled since last week's summary include:

  • The zram compressed swap subsystem (described in this article from 2013) has been moved out of the staging tree and into the core memory management code. Minchan Kim's commit notes that zram is now used heavily in television sets; recent Android handsets have started using it as well.

  • Support for user mode-setting in the Intel i915 driver has been deprecated, in preparation for removing it entirely roughly one year from now. Anybody who depends on this mode would do well to make their needs known in that time.

  • The Btrfs filesystem now provides much more information via sysfs, including supported features, space utilization data, and more. Much of this information is available via ioctl(), but sysfs interfaces can be easier to use in scripts or from the command line.

  • New hardware support includes:

    • Systems and processors: MIPS interAptiv processors.

    • Miscellaneous: ITE IT8603E hardware monitoring chips, Intel BayTrail IOSF-SB mailbox interface controllers, Broadcom BCM281xx watchdogs, Broadcom BCM2835 DMA controllers, MOXA ART SoC DMA controllers, and watchdogs controlled over GPIO lines.

    • Networking: RealTek RTL8821AE Wireless LAN NICs.

    • Video4Linux: TI OMAP4 camera controllers, Broadcom BCM2048 FM radio receivers, Silicon Labs Si4713 FM radio transmitters, Thanko Raremono AM/FM/SW radios, Montage M88DS3103 DVB-S/S2 demodulators, Montage M88TS2022 silicon tuners, and Samsung S5K5BAF camera sensors.

Changes visible to kernel developers include:

  • The "immutable biovec" patch set has been merged; it introduces some significant API changes to the block layer, but it enables the creation of arbitrarily large I/O requests and improves efficiency. See Documentation/block/biovecs.txt for more information.

One final feature that might yet make it into 3.14 is the proposed renameat2() system call, which Linus wanted to review more deeply before committing to. That code might get pulled before 3.14-rc2, but, Linus said, "quite frankly it's more likely to be left pending for 3.15". Other than that, the feature set for the 3.14 kernel should be complete at this time. If the usual schedule holds, this kernel can be expected sometime toward the end of March.

Comments (4 posted)

An x32 local exploit

By Jake Edge
February 5, 2014

So far, the x32 ABI—a 32-bit ABI for running on x86 processors in 64-bit mode—is not widely used. Only a few distributions have enabled support for it in their kernels (notably Ubuntu), which reduces the impact of a recently discovered local privilege escalation somewhat, but the bug has been in the kernel since 2012. It's a nasty hole, that required a quick fix for Ubuntu 13.10 (and two hardware enablement kernels for 12.04 LTS: linux-lts-raring and linux-lts-saucy).

It is the x32 version of recvmmsg() that has the bug. In the compat_sys_recvmmsg() function that is part of the compatibility shim for handling multiple ABIs in the kernel, a user-space pointer for the timeout value is treated as a kernel pointer (rather than copied using copy_from_user()) for the x32 ABI. The value of the timeout pointer is controlled by the user, but it gets passed as a kernel pointer that __sys_recvmmsg() (which implements the system call) will use. The kernel will dereference the pointer for both reading and writing, which allows a local, unprivileged user to get root privileges.

The problem was reported to the closed security@kernel.org and linux-distros mailing lists on January 28 by Kees Cook, after "PaX Team" reported it to the Chrome OS bug tracker (in a still-restricted entry). It was embargoed for two days to give distributions time to get fixes out. After that, "Solar Designer" reported it publicly since Cook was traveling. It is a serious bug, but is somewhat mitigated by the fact that few distributions have actually enabled the ABI.

The x32 ABI came about largely to combat the amount of memory wasted on x86_64 processors for 64-bit pointers (and long integers) in programs that did not require the extra 32 bits for each value. It allows programs to use the extra registers and other advantages that come with x86_64 without paying the penalty of extra memory usage. In theory, that should lead to less memory usage and faster programs due to a smaller cache footprint. So far, though, those benefits are somewhat speculative—and controversial.

X32 does exist in the kernel, however, and can be enabled with the CONFIG_X86_X32 flag. If it is enabled, any user can build an x32 program using GCC with the -mx32 flag. The kernel will recognize such a binary and handle it appropriately.

The bug was introduced in a February 2012 commit that was adding support for 64-bit time_t values to x32. The problematic code is as follows (from compat_sys_recvmmsg()):

    if (COMPAT_USE_64BIT_TIME)
            return __sys_recvmmsg(fd, (struct mmsghdr __user *)mmsg, vlen,
                                  flags | MSG_CMSG_COMPAT,
                                  (struct timespec *) timeout);
The timeout value is passed to that function as:
    struct compat_timespec __user *timeout
It is clearly annotated as a user-space pointer, but just gets passed to __sys_recvmmsg(). The fix is to use compat_get_timespec() to copy the data from user space before the call to __sys_recvmmsg() and compat_put_timespec() to copy any changes back to user space afterward.

Exploits have started to appear (for example, one by rebel and another by saelo). The basic idea is to use the fact that recvmmsg() will write the amount of time left in the timeout to the location specified by the timeout pointer. Since the value of that pointer is controlled by the user, it can be arranged to write known values (another exploit-controlled address, say) to somewhere "interesting", for example to a function pointer that gets called when the /proc/sys/net/core/somaxconn file is opened (as rebel's exploit does). The program will already have arranged to have "interesting" code (to gain root privileges) located at that address. When the function is called by the kernel via that pointer, the exploit's code is run.

Users of Ubuntu 13.04 should note that it reached its end of life two days before the bug was found, so no update for that kernel has been issued. One possible solution for those who have not yet upgraded to 13.10 (or are running some other distribution kernel and do not want to patch and build their kernel) is a module that disables the x32 version of the recvmmsg() system call.

As PaX Team noted in the report (quoted by Solar Designer), the presence of this bug certainly calls into question how much testing (fuzz testing in particular) has been done on the x32 ABI. For a bug of that nature to exist in the kernel for two years would also seem to indicate that it isn't just testing that has fallen by the wayside—heavy use would also seem to be precluded. In any case, the problem was found, reported, and fixed, now it is up to users (and any distributions beyond Ubuntu since we have received no other security advisories beyond those mentioned above) to update their kernels.

Comments (9 posted)

ARM, SBSA, UEFI, and ACPI

By Jonathan Corbet
February 5, 2014
For some years now we have been promised that ARM-based servers were going to start showing up in data centers. Opinions differ on whether ARM processors can be successful in this market, but there tends to be widespread agreement on a related point: the free-form, highly differentiated nature of ARM-based systems would make them painful to support in large, server-oriented environments, where users expect to be able to treat servers like interchangeable parts. ARM Ltd. clearly understands this problem; its recently announced "Server Base System Architecture" (SBSA) is an attempt to improve the situation. SBSA has been greeted with generally optimistic reviews, but the requirements that are coming along for the ride may yet stir things up in the development community.

In truth, it can be hard to say for sure what the SBSA mandates; the standard is currently kept behind a restrictive license, limiting the number of people who have read it. Arnd Bergmann described it this way (in the comments):

SBSA describes various hardware components that are required for a compliant server. Things like CPU, PCIe, timers, IOMMU, UART, watchdog, and interrupts are described in enough detail that it should be possible to take a compliant OS image and boot to the stage where you can load drivers for all other components that are not standardized...

Arnd went on to describe the requirements as "extremely reasonable". Olof Johansson, one of the maintainers of the arm-soc kernel tree, was also supportive of the idea:

It is an incredibly important document, because it does away with so much of the variables where ARM vendors in the past have chosen to differentiate between each other for no useful purpose. It allows us to write software that is much simpler, and that has a better chance to work across a large range of hardware with very small changes made over time.

In short, the SBSA is trying to create a base platform that can be assumed to be present as part of any compliant system. ARM has always lacked that platform, which is part of why supporting ARM systems has traditionally been a messy affair. To the extent that the SBSA succeeds, it will make life easier for kernel developers, hardware manufacturers, and server administrators; it is hard to be upset about that.

Nonetheless, the SBSA announcement has stirred up some heated discussion in the community. But the SBSA is mostly guilty by association; the controversial part of the platform is the firmware requirements, which are not addressed by the SBSA at all. Instead, these requirements will be released as part of a separate specification. The details of that specification are not known, but it has been made clear that it will mandate the use of both UEFI and ACPI on compliant systems.

UEFI is the low-level firmware found on most current PC systems. Like any firmware, UEFI has caused its share of misery, but, for the most part, developers are fine with its use in this context. UEFI works well enough, it has an open-source reference implementation, and supporting UEFI is not hard for the kernel to do. So there is no real opposition to the idea of supporting UEFI on ARM systems.

ACPI (the "Advanced Configuration and Power Interface") is another story. Getting ACPI working solidly on x86 systems was a long and painful process. Its detractors cite a few reasons to believe that it could be just as bad — if not worse — in the ARM world. For example, most ARM system-on-chip (SoC) vendors currently have no experience with ACPI, so they are going to have to come up to speed quickly, repeating lots of mistakes on the way. Each one of those mistakes is likely to find its way into deployed systems, meaning that the kernel would have to support them indefinitely.

There are also concerns about how well an ACPI-based ARM platform will come together. In the PC world, it became clear fairly quickly that the specification only meant so much when it came to hardware support. The real acid test was not compliance with a spec; instead, it was the simple question of "does Windows run on it?" Once Windows worked, firmware authors tended to stop fixing things. That led to numerous situations where the Linux kernel has to carefully do things exactly as Windows does, since that's the only well-tested mode of operation. Windows compatibility is not the most satisfying compliance test out there, but it did result in ACPI implementations converging sufficiently to allow them to be supported in a generic manner.

Windows does not have the same dominating position in the ARM server market; indeed, it's not clear that Windows will be offered on such systems at all. It is certainly possible that, say, Red Hat Enterprise Linux could play a similar role in this space. But it's also possible that vendors will just try to push lots of patches into the kernel to support their specific ACPI implementations. The result could be an incompatible, bug-ridden mess that takes many years to settle out.

Finally, there is the question of whether ACPI is needed at all. ACPI is, in the end, a standardized way to enable the operating system to discover and initialize the system's hardware. But, ACPI critics point out, the ARM architecture already has such a mechanism: device trees. The device tree work is reaching a point where it is reasonably mature; as Olof recently noted in a pull request to Linus:

New boards and systems continue to come in as new devicetree files that don't require corresponding C changes any more, which is indicating that the system is starting to work fairly well.

The developers who have been working on getting this system working well are now asking: why should that work be pushed aside in favor of a PC standard with no history in the ARM world? Among certain developers one can easily pick up a feeling that the kernel should simply refuse ARM ACPI support and mandate the use of device trees on all ARM systems.

In the end, it is hard to see any such thing happening; Linux kernel development has almost always been done in such a way as to favor running on as many systems as possible. And there may well be technical reasons for favoring ACPI on some systems, especially in situations where strict compatibility has to be maintained for years. As Grant Likely put it in a lengthy posting about the upcoming firmware standards, ACPI can make it easier for manufacturers to keep things compatible:

To begin with they already have hardware and process built around ACPI descriptions. Platform management tools are integrated with ACPI and they want to use the same technology between their x86 and ARM product offerings. They also go to great lengths to ensure that existing OS releases will boot on their hardware without patches to the kernel. Using ACPI allows them limited control over low level details of the platform so that they can abstract away differences between systems.

As Grant points out, that abstraction runs counter to the way things have traditionally been done on ARM-based systems; normal practice is to go through a great deal of pin configuration, regulator setup, clock programming, and more just to get things into an operational state. ACPI pushes a lot of that work into the firmware, taking it out of kernel developers' hands. That, perhaps, is where some of the resistance comes from: kernel developers like that control and are reluctant to cede it to firmware authors. It just doesn't feel right if you don't have to establish the right pinmux configuration before anything will work.

Still, ARM servers with ACPI are coming, and the kernel will almost certainly support them. The kernel will also, of course, continue to support device-tree-based systems; the chances of ACPI moving into the embedded world in the near future seem relatively small. After a while, ACPI on ARM will just be another configuration supported by the kernel, and people will be wondering why it was ever controversial. But "a while" may turn out to be a longer period of time than some people expect.

Comments (20 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 3.14-rc1 is out ?
Sebastian Andrzej Siewior 3.12.9-rt13 ?
Kamal Mostafa Linux 3.8.13.17 ?

Architecture-specific

Core kernel code

Development tools

Device drivers

Documentation

Michael Kerrisk (man-pages) man-pages-3.57 is released ?

Memory management

Security-related

Stephan Mueller CPU Jitter RNG ?

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2014, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds