LWN.net Weekly Edition for October 20, 2016
Automatically detecting kernel interface changes
ABI changes can be painful for anybody charged with the development and maintenance of software; that can be doubly so when the changes happen inadvertently and take people by surprise. There is tooling out there that can search for and report ABI changes. At Kernel Recipes 2016, Dodji Seketeli presented some early work he has done on a tool that would find unexpected kernel ABI changes and asked what might seem like an obvious question: would this functionality be useful to the development community?It is worth noting that he was not talking about the sort of ABI change that kernel developers worry about the most: changes to the user-space ABI. Instead, he is focusing on changes to the loadable-module ABI. At first blush, that might seem like it could reduce the level of interest in his work. As was pointed out in the talk, kernel developers are generally unwilling to talk about the module interface as an ABI at all; at best, it's a fluid API with no stability guarantees. This interface is explicitly allowed to change, so the number of developers wanting a tool to flag those changes might be thought to be small.
Interest in this kind of tool comes mostly from distributors. The
enterprise distributors could use it to let binary-driver vendors know when
something has changed in the module interface. But Ben Hutchings, Debian's
kernel maintainer, said that it would be generally useful to avoid making
changes to the module interface when patching a stable kernel.
The abidiff tool exists to provide just this kind of information. It reads the ELF symbol information from an object file, along with any debug information found there. It uses that information to build an internal representation of the ABI, which can be saved in a special XML file. Given ABI representations from two different objects, abidiff can report on the differences between the two.
Seketeli showed some example output from abidiff; it can be seen in his slides. The tool is able to detect changes in the types of a function's parameters or its return value. Anything that changes the size of a structure or the layout of its members will also be reported on. The removal of functions is noted, and so on. There are mechanisms for reducing noise by filtering out changes that might not be of interest; for example, changes to structures that do not appear in a specific set of header files can be suppressed. Other tools built on top of abidiff can look for ABI changes in libraries stored in package files.
But, he pointed out, none of this works with the kernel now. But wouldn't it be nice if we had a tool that could look at a set of kernel modules and exported interfaces, generating a report of what has changed from a previous version?
Getting there requires a bit of work. The tool would need to understand and handle the special ELF sections used in the kernel build; the __export_symbol and __export_symbol_gpl sections are particularly relevant. Kernel modules also need to be parsed properly, and an interface description generated from the result. The sheer size of the kernel presents a problem as well; it will force some memory-usage optimizations that have not been necessary thus far. These are the sort of issues he has been working on.
Thus far, he has added a kernel-specific mode to the abidw tool, which generates an XML representation of an ABI from an ELF file for use with abidiff. Some examples of the output can be found in this page. Anybody wanting to play with this work can grab a copy of the repository by running:
git clone -b dodji/kabidiff git://sourceware.org/git/libabigail.git
The discussion of this work was wide-ranging and energetic; it is hard to report on here. One topic that came up was the possibility of detecting changes in the user-space ABI instead; that is a tool that would be useful for regression testing in general. That, Seketeli allowed, is a rather harder problem. Even just looking at the system-call interface, it can be hard for a tool to understand what a system call's parameters are supposed to represent.
So a user-space ABI checker is probably not on the immediate horizon. We probably will see a tool that can find changes in the module interface, though, and that will have its own uses. Developers might be surprised to learn how often the changes they make affect the interface used by loadable modules.
Graphics world domination may be closer than it appears
The mainline kernel has support for a wide range of hardware. One place where support has traditionally been lacking, though, is graphics adapters. As a result, a great many people are still using proprietary, out-of-tree GPU drivers. Daniel Vetter went before the crowd at Kernel Recipes 2016 to say that the situation is not as bad as some think; indeed, he said, in this area as well as others, world domination is proceeding according to plan.
The current state of affairs
The first stop on Vetter's tour of the direct rendering manager (DRM) subsystem was documentation, and, in particular, the transition to Sphinx that has unfolded over the last couple of release cycles. The new formatted documentation system for the kernel is "pretty and awesome", and makes writing the documentation fun. As a result, there's now a lot more documentation than there used to be; indeed, the DRM documentation is pretty much complete. The biggest gap at this point is a top-level picture that nicely ties all the pieces together.
Moving on to rather older work (he titled this section "dungeons and dragons"), Vetter noted that there are still some DRM1 drivers around; these are at least ten years old at this point. They feature nasty user-space APIs, root holes, and other delightful things. These drivers are built around a midlayer architecture, a design which has gone out of fashion in recent years; the idea was to make it possible to build the drivers on BSD systems. In current kernels, these drivers are hidden behind the CONFIG_DRM_LEGACY option. They cannot be removed outright without breaking things, though, so they will remain for a while.
The IGT tools from Intel have proved to be a useful test suite for the validation of DRM drivers. They are Intel-specific for now, but are being modified to be more generic. At this point, a number of drivers and continuous-integration systems are using these tests to trap regressions. See the DRM documentation for information on how to validate drivers with the IGT suite.
Recently there has been an influx of DRM developers from the ARM community;
that has led to a new set of problems. The DRM subsystem is special,
Vetter said, in that it requires
that the user-space API for any driver be
open source. Much of the code for these drivers runs in user space; the
10% that runs in the kernel is "useless" without the user-space side as
well. A kernel driver without the user-space code cannot be enhanced or
maintained.
The ARM folks were unaware of this restriction and not used to operating in
this mode, so the DRM maintainers have had to start rejecting their
patches. The result was some screaming, but, at this point, the ARM
community understands the requirements and is starting to look at opening
up the user-space code as well.
One of the big changes in the DRM subsystem in recent years has been the switch to the atomic mode-setting API. The original DRM API featured one ioctl() call for each operation to be done; that resulted in a lot of display flickering as applications worked through a long series of changes. The atomic API allows everything to be done with a single call, leading to flicker-free changes. An atomic change is an all-or-nothing affair; if it succeeds at all, it will succeed completely.
This API also provides a separate call to check whether a set of changes would succeed without actually making those changes. It can be hard to know before trying; hardware often has weird restrictions that get in the way. He mentioned adapters with three video outputs but only two clocks as an example. Overlay support (the ability to directly display a video stream from another source, such as a camera, without going through user space) has been added to this API as well. Overlays went out of fashion for a while, but it turns out that a lot of power can be saved by outputting the video directly; it is a crucial feature for mobile systems.
At this point, there are 20 drivers in the mainline with atomic mode-setting implementations; another two or three are added with each release. The adoption of this API far exceeds the rate of adoption of the original kernel mode-setting API. It helps that a lot of functionality is in common code now, so the drivers themselves have gotten smaller. The support library has been made more modular; using it is not an all-or-nothing affair like it used to be.
Use of the atomic API is growing; one example is the drm_hwcomposer library, written by Google for use with Android systems. The ChromeOS Ozone interface running on Wayland uses it, as do all the other Wayland implementations. We have, he said, "a driver API to rule them all" for the first time.
Looking forward
Turning to future work, Vetter mentioned that there is interest in an interface that can allocate buffers for use with multiple devices. The ION memory allocator offers this functionality, but it remains Android-specific for now.
The old framebuffer device (fbdev) interface has been deprecated for some time, but it still turns out to be useful in some settings. In particular, it can save memory bandwidth and power on some low-end displays — those that require manual uploading of display data. The generic fbdev "defio" interface can now be remapped onto kernel mode-setting operations, making it possible to write a full fbdev driver on top of the DRM subsystem.
The simple display pipeline helper also makes writing simple drivers easy. For settings where there is a simple processing pipeline and a single connector, it can provide access to the atomic API without most of the complexity. With this helper, the DRM API is "now strictly better" than fbdev.
Fences are currently an area of active development. A fence is like the kernel's completion structure, in that it can be used to wait for (and signal) the completion of an operation; it is intended to be used with DMA operations in particular. There are two models for fence usage. In the "implicit" model, the kernel attaches fences to I/O buffers and takes care of everything; user space never sees it. The "explicit" model, instead, has the kernel providing fences to user space, which must then manage them itself.
The implicit model has been implemented for some time, in the form of reservation_object structures attached to DMA buffers. The TTM memory manager (used with the AMD and Nouveau drivers) has always supported it; other drivers are picking up support over time. This is the model preferred by the Linux desktop; both X and Wayland expect implicit fencing.
On the other hand, the Android system wants to use explicit fencing. It provides more control to user space and reduces the need for complexity in (vendor-supplied) graphics drivers. That was the driving factor in Android's decision, Vetter said; no vendor proved able to implement implicit fences correctly. The DRM subsystem implements an explicit fence as a sync_file structure, which is returned to user space as a file descriptor. User-space fences will be supported in the 4.9 kernel; the MSM/freedreno driver has added support so far.
As one might imagine, there is some tricky interaction between implicit and explicit fences. The solution that has been chosen is to use implicit fences by default, but to switch to the explicit model as soon as an application calls one of the explicit-fencing extensions.
Google has created the "HWC2" composer that can make use of DRM's explicit-fencing support; it is not yet publicly released, Vetter said, but will hopefully show up in 4.10. More information will be available at the Linux Plumbers Conference. Sometime soon it will be possible to run Android on a mainline kernel with an open-source graphics stack, he said.
Along those lines, what is the status of low-level GPU drivers? At this point, there are three vendor-supported open drivers in the mainline, and three more reverse-engineered ones. Of those, the Nouveau driver runs fairly well on Tegra systems. The freedreno driver is "pretty feature-complete" and is now competitive with proprietary drivers. The etnaviv driver is coming along, but still needs work on the user-space side. But, he said, there are still no vendor-supported system-on-chip drivers; that situation is "pretty dire."
He finished up by noting that the atomic API now "rules them all." There has been a lot of progress in documentation and general cleanup; all of the major gaps for authors of display drivers have been closed. Cross-driver fencing is reaching a point of being ready for everyone, and even rendering is showing some (albeit slow) progress. Upstream graphics, he said, is finally winning.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: Sandboxing with the Landlock LSM; New vulnerabilities in chromium, dbus, qemu, xen, ...
- Kernel: The 4.9 merge window closes; Device memory allocation; User-space driver APIs.
- Distributions: Browserified JavaScript in Debian; Ubuntu, RebeccaBlackOS, ...
- Development: PostgreSQL 9.6 improves synchronous replication and more; Apache OpenOffice, KDE celebrates 20 years, ...
- Announcements: JS Foundation now a Linux Foundation Project, open standards, ...
