Kernel development [LWN.net]

Kernel release status

The current development kernel is 4.8-rc6, released on September 11. "I still haven't decided whether we're going to do an rc8, but I guess I don't have to decide yet. Nothing looks particularly bad, and it will depend on how rc7 looks."

The known regression list for 4.8 has ten entries as of September 11.

Stable updates: 3.14.79, the last of the 3.14.x series, was released on September 11.

The 4.7.4 (announcement missing) and 4.4.21 stable updates are in the review process as of this writing; they can be expected at any time.

Comments (none posted)

Quotes of the week

But, it turned out that they would only use the kernel series for a while during the development phase, and then stop after they "shipped" the device. Look at all of the Android phones sitting on old obsolete versions of 3.4 and 3.10 stable kernels. They aren't even updated to newer ones, and so, it didn't really help all that much. Even though I am fixing security bugs for these kernels, no one pushes them to the users. I have an example of a security bug that a Google researcher found in a 3.10 kernel (but not mainline) I fixed and pushed out an update, but never got picked up in Nexus phones until 6 months later when I found the right person/group to poke within Google.

That was a 6 month window where anyone could have gotten root on your phone, easily.

People say "look, we are using an LTS kernel in our product, all must be good!" but if they don't update it, it's broken and insecure, and really no better than if they were using 3.10.0 in a way.

— Greg Kroah-Hartman

All I have left to say is:

     yell_WTF(nr_wtf_moments);

I leave the value of the function argument to your imagination.

— Thomas Gleixner

Comments (2 posted)

Lindholm: UEFI Driver Development - Part 1

Leif Lindholm starts a series on writing UEFI drivers. "So, having never actually written a UEFI driver from scratch, and most of the drivers I have come across having really been platform drivers, I figured that would be a good start to write a standalone driver from scratch. And the outcome is this slightly hands-on blog post series."

Comments (none posted)

Exclusive page-frame ownership

By Jonathan Corbet
September 14, 2016

The objective of most attacks on the kernel is to run code of the attacker's choosing with kernel privileges. It is thus unsurprising that many hardening techniques are aimed at preventing attackers from putting arbitrary code in places where a compromised kernel might be convinced to run it. Unfortunately, the design of the kernel's memory-management subsystem makes it possible for many kernel access-prevention techniques to be circumvented. A patch set is circulating that attempts to close that hole, but it brings an interesting question of its own: is the kernel community now willing to take a performance hit for better security?

An attacker wanting to get the kernel to run arbitrary code faces a problem: where can that code be put so that the kernel might run it? If the kernel can be convinced to run code found in user space, that problem becomes much easier to solve, since placing code in user-space memory is something that anybody can do. Since user-space memory remains mapped while the processor is running in kernel mode, all that needs to be done is to convince the kernel to jump to a user-space address. Years ago, it was possible to simply map the page at address 0 and find a bug that would cause the kernel to jump to a null pointer. Such simple attacks have been headed off, but more complex exploits are still common.

Obviously, the best solution is to ensure that the kernel will never try to jump to a user-space address. If one accepts that there will always be bugs, though, it makes sense to add other defenses, such as simply preventing the execution of user-space memory by the kernel. The PaX KERNEXEC and UDEREF mechanisms are designed to prevent this kind of user-space access. More recently, the processor manufacturers have gotten into the game as well; Intel now has supervisor mode access prevention and supervisor mode execute protection, while ARM has added privileged execute-never. On systems where these mechanisms are fully implemented, it should be impossible for the kernel to execute code found in user-space memory.

Except, as this paper from Vasileios P. Kemerlis et al. [PDF] points out, there's a loophole. User-space memory is accessed via a process's page tables, and the various access-prevention mechanisms work to block kernel access via those page tables. But the kernel also maintains a linear mapping of the entire range of physical memory (on 64-bit systems; the situation on 32-bit systems is a bit more complicated). This mapping has many uses within the kernel, with page-level memory management being near the top of the list. It provides a separate address for every physical page in the system. Importantly, it's a kernel-space address and, on some systems (x86 before 3.9 and all ARM), this memory range is executable by the kernel.

If an attacker can cause the kernel to jump into the direct mapping, none of the user-space access-prevention mechanisms will apply, even if the target address corresponds to a user-space page. So the direct mapping offers a convenient way to bypass these protections, with only one little catch: an attacker must be able to determine the physical address of the page containing the exploit code. As the paper points out, the pagemap files under /proc will provide that information, and, while these files can be disabled, distributions tend not to do that. So, on most systems, everything is in place to enable an attacker to exploit a bug that can cause a jump to an arbitrary address and the existing access-prevention mechanisms are powerless to stop it.

(Life gets a little harder on current x86 kernels, where it is no longer possible to directly execute code via the direct mapping. In such cases, the attacker must resort to return-oriented programming instead — not a huge obstacle for many attackers.)

The solution, as described in the paper and implemented in the exclusive page frame ownership (XPFO) patch set posted by Juerg Haefliger, is to take away the back-door access to user-space pages via the direct mapping. The mechanism is fairly simple in concept. Whenever a page is allocated for user-space use (something the kernel already indicates with the GFP flags in the allocation request), the direct mapping for that page is removed. Thus, even if an attacker can generate the directly mapped address for the page and get the kernel to jump there, the kernel will fault due to lack of access permissions to that page. When user space frees a page, it will be zeroed (to prevent attacks via hostile code left in the page) and returned to the direct map.

There are times when the kernel must access user-space memory, of course; the copy_to_user() and copy_from_user() functions are obvious examples. In such cases, the direct mapping is restored for the duration of the operation.

Naturally, there is a performance cost to this. The mapping and unmapping of pages in the kernel's address space will slow things down somewhat, as will the zeroing of returned user-space pages. Perhaps more significant, though, is a change in how the direct mapping is implemented. Normally, the kernel creates this mapping with huge pages; that, among other things, greatly reduces the pressure on the processor's translation lookaside buffer (TLB) when the direct mapping is accessed. But use of huge pages is incompatible with adding and removing mappings for individual (small) pages in that range, so, with XPFO, the huge-page mappings have to go. There is also some increased memory overhead resulting from the need to store more per-page information. All told, enabling XPFO has a performance cost up to about 3% in the worst case, though most of the benchmarks reported in the paper suffered much less than that.

The patch set needs some completion work before it can be seriously considered for merging into the mainline. Once that point comes, one can assume that the conversation will hinge on how effective it is at preventing exploits and whether it is worth the performance cost. The fact that the slowdown for kernel builds is 2.5% could prove to be a bit of an obstacle in this discussion. A performance hit on that scale is a hard thing to swallow, but so are successful exploits. Which pill will prove to be the bitterest will have to be seen as the patch set progresses.

Comments (19 posted)

The need for TTY slave devices

September 14, 2016

This article was contributed by Neil Brown

A typical computer system has numerous "buses" to carry data and control from the CPU out to various peripheral devices and back. Linux generally supports these buses by having "master" drivers to manage the hardware at the CPU end of the bus, and "slave" drivers to manage the peripheral. There is one particular bus for which there are no slave drivers, at least not in the normal sense, but for which there is recurring interest in supporting the creation of such drivers. The asynchronous character-oriented serial bus, one of the oldest bus types that is still in wide use today, is managed quite differently from other buses, but might soon be enhanced to better meet current needs.

One difficulty I have in discussing this bus is that there does not seem to be a suitably generic name. Once upon a time I would have called it a "serial connection", but today most connections are serial, whether SATA, SAS (serial attached SCSI), Ethernet, or I2C. So that name doesn't work. RS-232 was once a popular name, but that specifies higher voltage levels and more wires than are normally found on the intra-board connections that we will be interested in. The name UART, standing for Universal Asynchronous Receiver/Transmitter, is at about the right level of generality, but really refers to the controlling hardware rather than the bus itself. TTY, an abbreviation for "teletype", is the name I will largely use, not because there are any teletypewriters connected to any computers I have used in a long time, but because it is a name that is widely used in Unix and Linux history and in present implementations, and it is nice and short.

When a computer system has some TTY ports, Linux will discover these ports and create devices like /dev/ttyS0 to allow them to be managed. In general, Linux knows nothing about what might be connected to the port. One exception is that a "console" might be known to be attached to one of the ports, and Linux will then send kernel messages to that port. In other cases, Linux needs to be explicitly told what is attached if it is expected to handle it in any particular way.

Line disciplines

Linux doesn't always need to know what is attached to a TTY port — a program in user space can open the /dev/ttyXX device and read or write as appropriate. Sometimes, though, it can be helpful for the kernel to take a larger role; for those times there are "line disciplines", which is really just another name for "protocol handlers". As an example: dial-up networking uses a modem to connect a TTY port on one machine to a similar port on another machine. Once a connection is established over the modem, the PPP protocol is often used to allow Internet traffic to flow between the computers. As this requires tight integration with the networking subsystem in the kernel, it is easiest if the PPP protocol itself is handled directly by Linux. To this end, there is an N_PPP line discipline. Once the connection is established, pppd (the user-space daemon for managing the connection) sets the line discipline to N_PPP and all further traffic is handled directly by the kernel.

Another line discipline that was once more widely used than it is now is the N_MOUSE protocol for communicating with serial-attached mice. N_MOUSE passes data from the TTY port though to the "input" subsystem so it appears on /dev/input/mouse0 or similar and can be easily used by your windowing system. There are a collection of other line disciplines for various different serial protocols. Each one needs to be explicitly activated by a program like pppd for N_PPP, inputattach for N_MOUSE, and hciattach for N_HCI (the protocol for communicating with Bluetooth transceivers). The line discipline only remains active for as long as that program keeps the TTY device open.

If line disciplines were being invented today, they would almost certainly be drivers on a bus that would get bound to the hardware either automatically, or by writing to a bind file in sysfs.

Problematic use cases

Though the mechanism for attaching a line discipline to a TTY port allows a lot of serial-attached devices to be used quite effectively, there are two areas where the current solution is not entirely satisfactory thus motivating various people to seek improvements. These areas involve transparent discovery and sideband controls such as power management.

If I have a computer system, such as a mobile device, which has, for example, a Bluetooth transceiver permanently attached to a UART, then I shouldn't have to tell the software running on that device about the hardware arrangement. The firmware on the device should know about the Bluetooth device, possibly from nodes in a device-tree description of the hardware, or possibly from information in the ACPI tables, and something should read that description and configure the TTY port appropriately. It might be possible for a user-space program to extract the information and run hciattach, but as firmware tables are normally interpreted by the kernel, and as hciattach does little more than request the N_HCI line discipline, it seems sensible to have the kernel set everything up transparently. The "little more" that hciattach does might involve setting a bit rate, performing some initialization, or uploading firmware. All of these are the sorts of things the kernel already does, so it would be no extra burden.

Even in cases where the device can be managed without a dedicated line discipline, there might be a need to do more than just send and receive bytes. Power management is important in all computer systems these days and, while some serial-attached devices can power themselves up or down in response to commands over the serial connection, this capability is not universal. Were we using RS-232, the DTR (data terminal ready) line would probably be used to manage power, but many UARTs do not have a DTR line, and asserting a signal is not necessarily the best way to control power for an integrated device. Device power management in Linux is generally handled by the device driver for the particular device, since it knows the needs and is able to assert a GPIO output, activate a regulator, or whatever else is needed. But, with TTY ports, there is no slave device driver to perform these tasks.

Both of these difficulties could be solved if a TTY were treated more like a bus that could have slave devices attached as children. The configuring of child devices is the normal way that device information from device tree or ACPI tables is handled, and these devices would be well placed to select a non-default line discipline or to control the power management of the device when it is opened or activated.

Where to put the device

Though I was not involved in the most recent discussions on this topic, I have attempted to make progress in this problem space in the past; a recurring problem is that it wasn't obvious, to me at least, what shape the solution should take. Above, I have described the need as being for a "TTY bus" with "slave devices" but that understanding only came about after several failures, and there is not yet a certainty that it is best solution.

Linux has a concept of a "platform bus", which is a "pseudo-bus" that is described more by examples than by a concrete purpose. It is contrasted with "large formally specified [buses] like PCI or USB." A driver to control a GPIO line to manage the power of a GPS device attached to a TTY could easily be seen as part of the "platform" rather than part of a genuine bus, particularly if you didn't think of a TTY as a "bus", which I certainly didn't. So an early attempt created a platform device to handle power management and taught the TTY driver to tell the attached platform device when it was opened or closed. This didn't address the auto-detection need, which did not concern me at the time. The patch was vetoed by Greg Kroah-Hartman, both when I proposed it and when it was recently re-proposed by Sebastian Reichel, who is trying to make the Bluetooth transceiver on the Nokia N950 work correctly. As Kroah-Hartman put it: "I've said before that a "serial" bus should be created".

Rob Herring responded to this challenge and proposed a "UART slave device bus" that is not entirely unlike something I proposed last year. Linux contains a "serial core" subsystem that supports a wide range of serial character devices and which provides a uart_port abstraction. This is separate from the "tty" subsystem, which provides a tty_port, handles all the peculiarities of Posix TTY devices, and manages the line disciplines. As all the devices that anyone wanted to create a slave device for were UARTs, it seemed natural to both Herring and myself to make changes at the uart_port level.

Alan Cox vetoed this one. In his view, the UART isn't the right place to attach slaves because not all serial devices use the UART code, or not in the same way. In particular, USB-attached serial ports do not use the UART code at all. Cox recalled that: "As I told you over six months ago uart_port is not the correct abstraction. You need to be working at the tty_port layer," and again: "This again is why it needs to be at the tty_port layer." The tty_port interface, provided by the TTY layer, is clearly the more general interface for serial devices .... or is it?

The serio bus

There are some serial character devices that don't use UARTs and don't even interface with the TTY layer. The most common example is the PS/2 mouse. The over-the-wire protocol used by a PS/2 mouse is similar to that used by serial-port mice, but is more constrained and so can be supported with simpler hardware than a UART. In Linux, the driver for PS/2 mouse hardware (and PS/2 keyboards as well) is attached to the serio (serial I/O) bus, which feeds into the input subsystem.

The N_MOUSE TTY line discipline mentioned earlier is really a generic gateway from TTY devices into the serio bus. It was designed for use with serial mice, but could be used with any device with a similar character-based interface. Herring, with a little prompting from Arnd Bergmann, wondered if the serio bus could become the place to attach the slave devices that we seem to want. To this end, he prepared some patches that allow device tree configuration to direct a serio device to attach to the HCI interface for the Bluetooth subsystem. With these patches it is still necessary to run inputattach to gateway the TTY to the serio bus using the N_MOUSE line discipline. Herring claims: "Once a tty_port based serio port driver is in place, this step will not be needed". In some ways, this seems like an step in the right direction, in others it seems like it might just be moving the problem to a new location.

While this serio approach could work well for auto-configuration of Bluetooth devices, it isn't obvious that it works well for power management of GPS receivers using sideband signaling. For a GPS receiver we really still need the TTY device, /dev/ttyXX, to exist much as it does now. We don't want to attach an alternate line discipline, because the kernel doesn't understand the protocols (such as NMEA and various binary protocols) that GPS devices use. The current solution of running gpsd to interpret these protocols is quite effective. Though Marcel Holtmann attested that he is "not convinced that GPS should be represented as /dev/ttyS0 or similar TTY" and Kroah-Hartman expressed support for this position, the creation of a GPS device type seems to be a separate need than allowing a device to be powered on when a TTY is opened, and powered off when it is closed.

Ideas for forward progress

Though this recent conversation does not seem to have produced any code that is likely to get merged, it did bring up a lot of ideas and a lot of constructive criticism for why some of the proposals were not satisfactory. One of the most discouraging responses one can get when submitting a patch is to have it rejected with no clear explanation of why it was rejected. That didn't happen here. Of all the feedback that was provided, possibly the most concretely useful was Cox's insistence that tty_port should be the basis of a new bus ("The basic problem is that the bus should be tty_ports not uart, fix that and the rest starts to make sense.") and his explanation of the role of the tty_port as the item in the device model which "has the lifetime of the hardware".

When serial hardware is discovered, whether it is a UART, a USB-attached serial port, or something else, a tty_port is created. It is currently registered as a character device so that an entry appears in /dev, which can then be opened. When it is opened, a tty_struct is attached, and line disciplines can be attached to that. The right approach seems to be to insert a bus abstraction under the tty_port so that different drivers can be bound to the port. The default driver would register a character device that would attach a tty_struct when it was opened. Other drivers might connect through to the Bluetooth subsystem, or might interpose some power management controls and then register a TTY character device.

One reason this hasn't been done already is that the TTY layer is a little complicated. tty_port and tty_struct are closely intertwined and separating them, as seems to be required, is not a task for the timid. Cox has posted an RFC patch that takes a step in this direction by allowing a tty_port to be used without an open file handle. There is a lot more that would need to be done, but this is a valuable start, particularly as it comes from someone with a deep knowledge of the TTY layer who can probably see the end game more clearly than the rest of us.

The conversation has died down for the moment. That might mean that people have been distracted by more urgent issues, or it could mean that now is a time for coding rather than discussion. This is a topic that has arisen several times in the past and while it is generally met with enthusiastic in-principle agreement, it does not seem have been quite important enough to anyone to push through the various barriers to find a solution that is broadly acceptable. Maybe this time will be different.

In a conversation on the Kernel Summit email list concerning the different sorts of "stable" kernels that vendors use and how much is being backported to them, Tim Bird lamented that "there are still significant areas where the mainline kernel just doesn't have the support needed for shipping product." The appropriate emphasis, in the mainline kernel community, to require well designed and fully general solutions inevitably means that some functionality takes a while to land. This means that vendors with tight deadlines need to choose between staying close to mainline or having all the functionality they want. It is understandable that they will often choose the latter. Finding ways to minimize the need for this choice is one of the ongoing challenges for the kernel community and one that we might see playing out, in a small way, with the need for TTY slave devices.

Comments (5 posted)

Linus Torvalds Linux 4.8-rc6 Sep 11

Greg KH Linux 3.14.79 Sep 11

Punit Agrawal Add support for monitoring guest TLB operations Sep 13

Catalin Marinas arm64: Privileged Access Never using TTBR0_EL1 switching Sep 13

Srinivas Pandruvada Support Intel Turbo Boost Max Technology 3.0 Sep 08

Kyle Huey prctl,x86 Add PR_[GET|SET]_CPUID for controlling the CPUID instruction. Sep 11

Andy Lutomirski thread_info cleanups and stack caching Sep 13

Nicolai Stange adapt clockevents frequencies to mono clock Sep 09

Viresh Kumar cpufreq: Upstream Android's Interactive governor Sep 14

Deepa Dinamani Introduce current_time() api Sep 14

Byungchul Park lockdep: Implement crossrelease feature Sep 13

Lorenzo Pieralisi ACPI IORT ARM SMMU support Sep 09

Alexandre TORGUE Add STM32 EXTI interrupt controller support Sep 09

vadimp@mellanox.com i2c: mux: mellanox: add driver Sep 09

Brendan Higgins i2c: aspeed: added driver for Aspeed I2C Sep 09

Jian Yuan pwm: add pwm driver for HiSilicon BVT SOCs Sep 12

Chris Zhong Rockchip Type-C DisplayPort driver Sep 09

YT Shen MT2701 DRM support Sep 12

Adit Ranadive Add Paravirtual RDMA Driver Sep 11

Ram Amrani QLogic RDMA Driver (qedr) RFC Sep 12

kernel@martin.sperl.org thermal: bcm2835: add thermal driver Sep 09

Tomas Winkler Replay Protected Memory Block (RPMB) subsystem Sep 13

Stephen Boyd usb: misc: Add a driver for TC7USB40MU Sep 13

John Crispin net-next: dsa: add QCA8K support Sep 14

Benjamin Gaignard STIH CEC driver Sep 14

William Breathitt Gray Add IIO support for counter devices Sep 14

dimitrysh@google.com of: Overlay manager Sep 08

Rafael J. Wysocki Functional dependencies between devices Sep 08

Heikki Krogerus USB Type-C Connector class Sep 09

Geert Uytterhoeven spi: Add slave mode support Sep 12

Jon Hunter PM / Domains: Add support for removing PM domains Sep 12

Mika Westerberg ACPI: Add support for WDAT (Watchdog Action Table) Sep 13

Kishon Vijay Abraham I pci: support for configurable PCI endpoint Sep 14

Greg KH Greybus driver subsystem for 4.9-rc1 Sep 14

Mauro Carvalho Chehab Create a book for Kernel development Sep 12

Dave Hansen [RFCv5] add manpages for Memory Protection Keys Sep 13

Christoph Hellwig iomap based DAX path Sep 09

Christoph Hellwig blk-mq: allow passing in an external queue mapping V3 Sep 14

Richard Weinberger ubifs: Add overlayfs support Sep 13

Anand Jain Preliminary BTRFS Encryption Sep 13

Mel Gorman Reduce tree_lock contention during swap and reclaim of a single file v1 Sep 09

Juerg Haefliger Add support for eXclusive Page Frame Ownership (XPFO) Sep 14

kan.liang@intel.com Kernel NET policy Sep 12

Tom Herbert net: ILA resolver and generic resolver backend Sep 09

Daniel Mack Add eBPF hooks for cgroups Sep 12

Jamal Hadi Salim net_sched: Introduce skbmod action Sep 12

Mickaël Salaün Landlock LSM: Unprivileged sandboxing Sep 14

Pavel Emelyanov CRIU v2.6 Sep 12

Kernel development

Brief items

Kernel release status

Quotes of the week

Lindholm: UEFI Driver Development - Part 1

Kernel development news

Exclusive page-frame ownership

The need for TTY slave devices

Line disciplines

Problematic use cases

Where to put the device

The serio bus

Ideas for forward progress

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Device driver infrastructure

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Miscellaneous