Kernel development
Brief items
Kernel release status
The 4.9 merge window remains open; see the separate article below for a summary of the work merged so far.Stable updates: 4.8.1, 4.7.7, and 4.4.24 were released on October 7.
Quotes of the week
I will have to ask around the security people to see what they think.
Kernel development news
4.9 Merge window part 2
As of this writing, Linus has pulled 13,488 non-merge changesets into the mainline repository for the 4.9 development cycle. That suggests that not only will 4.9 be the busiest cycle in the kernel's history, but that it will surpass the previous record (3.15, at 13,722 changesets) before the merge window closes. The merging of the greybus driver code has a lot to do with that but, even without greybus, there is a lot going on this time around.Among the user-visible changes merged since last week's summary are:
- The system calls for the memory protection keys feature have been
merged. The pkey_alloc(), pkey_free(), and
pkey_mprotect() calls are as described in this article, but the pkey_set()
and pkey_free() calls, which can be implemented purely in
user space, were not included. See Documentation/x86/protection-keys.txt for
details.
- The bottleneck bandwidth and RTT (BBR)
congestion control algorithm has been merged.
- The BATMAN mesh networking subsystem has a new netlink-based
configuration mechanism that will, over time, supersede and replace
the older, debugfs-based interface.
- The netfilter module supports a new "quota" mechanism designed to
enable the enforcement of byte quotas. There's also a new
random-number generation module intended to enable the random
distribution of packets (across multiple queues, for example).
- There is a new just-in-time BPF compiler that can be used to load BPF
programs for execution within Netronome network interfaces. In 4.9,
only the cls_bpf classifier module will take advantage of
this capability.
- The filesystems in user space (FUSE) module now supports POSIX
access-control lists.
- The Greybus subsystem has been
merged. This bus was intended for the "Project Ara" phone, which has
since been canceled, but Greg Kroah-Hartman successfully argued for its inclusion
anyway. This merge includes the entire development history
for this code, some 2,400 changesets in total.
- There is a new set of resource limits controlling how many namespaces
may be created within any given user namespace. See Documentation/sysctl/user.txt for
details.
- The hardware latency tracer (which seeks to flush out latencies caused
by the hardware itself) has moved into the mainline from the realtime
tree. See Documentation/trace/hwlat_detector.txt
for details and usage information.
- The ubifs filesystem now supports overlayfs and the O_TMPFILE
file-creation option.
- New hardware support includes:
- Systems and processors:
Broadcom BCM53573-based processors.
- Audio:
Nuvoton NAU8810 audio codecs,
Realtek RT5660/RT5663/RT5668 audio codecs,
X-Powers AC100 audio codecs, and
Samsung Exynos SoC low power audio subsystems.
- Industrial I/O:
Maxim thermocouple sensors,
Measurement Computing CIO-DAC digital-to-analog converters,
Asahi Kasei AK8974 3-axis magnetometers,
Domintech DMARD05/DMARD06/DMARD07 accelerometers,
Texas Instruments ADC161S626 1-channel differential
analog-to-digital converters,
Texas Instruments' ADC12130/ADC12132/ADC12138 analog-to-digital
converters,
MediaTek mt65xx analog-to-digital converters,
Linear Technology LTC2485 analog-to-digital converters,
Analog Devices AD8801/AD8803 digital-to-analog converters,
Apex Embedded Systems STX104 analog-to-digital converters,
mCube MC3230 digital accelerometers, and
Murata ZPA2326 pressure sensors.
- Media:
Atmel image sensor controllers,
Analog Devices AD5820 lens voice coils,
Techwell TW5864 video/audio grabber/encoders,
STMicroelectronics HVA multi-format video encoders,
STMicroelectronics STiH4xx HDMI CEC interfaces, and
Gennum GS1662 HD/SD-SDI serializers.
- Miscellaneous:
Rockchip RK818 power-management chips,
Elan eKTF2127 touchscreen controllers,
Microsemi PQI SCSI controllers,
Intel integrated sensor hubs,
Cavium ThunderX I2C buses,
Cavium ThunderX random number generators,
APM X-Gene SoC performance monitoring units,
Qualcomm external bus interfaces (version 2),
JDI LT070ME05000 WUXGA DSI panels, and
Amlogic Meson PWM controllers.
- Networking:
Microsemi VSC85xx PHYs,
Amazon Elastic Network adapters,
Thunder RGX/RGMII MAC interfaces,
Chelsio crypto coprocessors,
Qualcomm EMAC gigabit Ethernet controllers, and
Qualcomm Atheros QCA8K Ethernet switches.
- Pin Control / GPIO:
Aspeed G4/G5 pin and GPIO controllers,
NextThing GR8 pin controllers,
X-Powers AXP209 PMIC GPIO controllers,
Intel Whiskey Cove PMIC GPIO controllers,
Diamond Systems GPIO-MM controllers,
Technologic Systems FPGA I2C GPIO controllers, and
TI LP873X PMIC GPIO controllers.
- Thermal: Qualcomm TSENS temperature sensors, QorIQ thermal monitoring units, and Intel Broxton PMIC thermal monitors.
- Systems and processors:
Broadcom BCM53573-based processors.
Changes visible to kernel developers include:
- The handling of messages printed with printk() has changed for
the case of single-line messages created with multiple
printk() calls. The rule has long been that the continuation
lines should be marked with the KERN_CONT pseudo log level,
but that requirement has not been enforced for several years. As of
this
commit, the use of KERN_CONT is again mandatory; without
it, output will be garbled. Many places in the kernel will need
fixing; for the short term, expect some ugly output from 4.9-rc
kernels.
- The "kthread_worker" API has seen a number of changes. These include
the renaming
of most functions to start with "kthread_"
(e.g. init_kthread_worker() becomes
kthread_init_worker()), the addition of kthread_create_worker()
and kthread_destroy_worker(),
support for delayed kthread
work, and support for freezable
kthreads.
- The network subsystem has added a module called "strparser"; its job
is to parse (in-kernel) application-layer protocol messages from a TCP
connection. See Documentation/networking/strparser.txt
for details.
- The handling of extended attributes in filesystems has changed.
Filesystems that support extended attributes should create an
xattr_handlers structure with its low-level methods and
attach it to the superblock structure. The
setxattr(), getxattr() and removexattr()
inode operations are no longer used and have been removed.
- The rename() inode operation has gained a flags
argument. In truth, rename() was removed and the
rename2() operation was, well, renamed; all in-kernel filesystems
have been updated to reflect the change.
- The new function current_time() returns the current time at the proper resolution for storage in a specific filesystem; it replaces the old CURRENT_TIME() macro. Among other things, the new API is year-2038 safe.
At this point, it seems likely that things will slow down considerably as the 4.9 merge window approaches its scheduled closing on October 16.
On Linux kernel maintainer scalability
LWN's traditional development statistics article for the 4.6 development cycle ended with a statement that the process was running smoothly and that there were no process scalability issues in sight. Wolfram Sang started his 2016 LinuxCon Europe talk by taking issue with that claim. He thinks that there are indeed scalability problems in the kernel's development process. A look at his argument is of interest, especially when contrasted with another recent talk on maintainer scalability.
Beyond changesets merged
Sang's core point is that looking at the number of patches merged only tells part of the story; it says nothing about what had to happen to get those patches into the mainline. Looking at the last few years' worth of development cycles, he noted that relatively few patches carry tags beyond the Signed-off-by applied by the developer and the committer. In particular, around the 3.0 days, only about 20% of the patches in the mainline had an Acked-by, Reviewed-by, or Tested-by tag indicating that anybody other than the maintainer had seriously looked at them. That number is closer to 40% in current kernels, he said; it is a clear improvement, but still does not make him happy. For a properly scalable kernel process, he said, we should have much higher levels of review by developers who are not the subsystem maintainer.
Another metric one can look at is the time difference between the date on
the patch and the date on which it was first committed to a git tree. The
Ethernet driver maintainers, he said, are heroes: 80% of all the patches
were accepted within two weeks. A number of other subsystems do not do
anywhere near as well, and some have gotten significantly worse. I2C,
Sang's own subsystem, has stayed about the same over the last three years,
which surprised him. As the workload has increased, it has come to feel
like things are getting much worse.
The time-to-commit metric may be useful, but it is not without its flaws. The final version of a patch may have been committed fairly quickly, but previous versions could have languished without review for a long time. Patches that are rejected or that get lost are not considered at all.
One way to try to get a better handle on things is to look at the Patchwork systems for the subystems that use it, and, in particular, to look at the backlog of patches found there. For I2C, it shows a relatively low backlog until about 3.16, when he gave up on trying to keep up with the flow and fell behind. The ACPI subsystem has an amazing backlog of zero. The relevant maintainer (Rafael Wysocki) was in the room; he noted that it depends on how a subsystem uses Patchwork. He said that he quickly marks a lot of patches as inapplicable; Sang replied that he doesn't even have the time to do that. The ext4 filesystem shows a linear growth in its backlog, up to about 800 patches currently. The numbers for several other subsystems were shown; almost all of them are going up.
The problem, Sang said, is that the number of committers is not scaling to match the growing number of contributors to the kernel. We are getting more reviewers, but they are coming in slowly and are not anywhere near enough. As a result, the number of unprocessed patches is on the increase.
How can this problem be addressed? Users can help by commenting on and, especially, testing patches. Developers need to be aware that sloppiness is often a problem; they should acknowledge when they have done suboptimal work. Developers need to take part in reviewing; if nothing else, they should review their own patches. For maintainers, working harder is not generally the solution; that just leads to burnout. They should get their tools in order and automate tasks whenever possible; looking at what other maintainers are using can be helpful. Companies should allow and encourage their developers to spend time reviewing patches.
What he does not want to see is a "kernel infrastructure initiative". The Core Infrastructure Initiative, run by the Linux Foundation as a way to channel resources to important but underfunded projects, is a good thing, but it is a reaction to a problem that got out of control. Things had to go wrong first. Sang would rather see action now to keep things from getting to that state.
For I2C, Sang intends to step back a bit. He will become one of the I2C developers, one of its architects, and one of its reviewers, but he will not be the only one. That may slow things down in the short term, since he will be doing less patch review. The advantage is that he will stay sane, and will have the time and energy to try to address the problem on higher levels.
The maintainer as bottleneck
While Sang intends to step back on patch review, his plan still calls for him to be the sole committer of patches for the I2C subsystem. In this context, it is interesting to look at another talk, given at Kernel Recipes one week earlier by i915 graphics driver maintainer Daniel Vetter. He, too, made the point that maintainers don't scale, but he would rather see maintainers get help at all levels.
One year ago, he would have said that there was no problem in the i915 subsystem. Applying patches was relatively easy, after all. He had never reviewed the majority of the patches there; i915 has a number of developers who can do that. But, as the single maintainer, he gave the subsystem "a bus factor of one"; when he wasn't available for any reason, things simply came to a stop.
At the 2015 Kernel Summit, Linus Torvalds said
that he has come to
like the group maintainer model, where more than one person takes
responsibility for a given subsystem. Vetter wanted to give that a try,
but he quickly ran into a problem: nobody was willing to sign up as the
co-maintainer for the i915 subsystem. He was, however, able to find
developers who were willing to commit patches for i915; indeed, he signed
up 15 of them. He figured he would experiment with the multiple-committer
model for one release cycle. After all, nobody had ever really tried this
before in the kernel, so it must be a stupid idea.
That was one year ago, he said, and disaster has failed to materialize. Instead, he has "seriously happy contributors," and a whole set of reviewers who can apply the patches they look at. He is now "a bored maintainer," and all of the nagging and begging to get code merged has gone away. He has found that commit rights are a strong carrot that can be used to get developers and companies to contribute — and to be careful about the work they do. It also leads to "distributed conflict management" that makes life easier.
So what does he do anymore? His main job at this point, as "the" maintainer for i915, is communications with the outside, including any work that requires coordination with other subsystem trees. He connects developers with the appropriate reviewers, and puts together the pull requests to send work upstream. And, of course, he "takes the blame for everything".
To make this model work, he said, a subsystem clearly needs a team of developers, and non-maintainer reviews must be the norm. The group should be consistent, with developers who stay around; otherwise, enforcement via social feedback will not work well. Good documentation and tools are necessary; i915 has a set of process documents on this page. When somebody makes a mistake, if possible, a check should be put into the tools to keep it from happening again.
Good testing is crucial to this model. A multi-committer tree can never be rebased, so there is no way to remove embarrassing mistakes. They really need to be avoided in the first place; that requires good pre-commit testing to ensure that the obscure corner cases do not break.
The rough consensus model works best for a group like this. The default on any patch is "no action", so a developer's full disagreement will stop things. What's important, he said, is to have agreement on the goals for the subsystem; disagreement on the path taken toward those goals is acceptable. A good rule of thumb is "if you push a patch and there's screaming on IRC, you shouldn't have done it."
In general, he said, the kernel could probably benefit from more maintainer groups like this. It is a more efficient way to maintain busy subsystems, especially those that currently have a lot of submaintainer trees.
Meanwhile in Berlin
Fast-forward one week; your editor raised this idea in Sang's talk and asked whether the single-committer model might be part of the scalability problems raised there. The developers in that room tended toward skepticism over whether the idea could work outside of the i915 tree. Wysocki, in particular, seemed to feel that there were relatively few submaintainers who could be trusted with full commit access. These maintainers push patches that must be rejected fairly often, so they should not be able to commit directly to the subsystem tree.
Perhaps these developers, too, would be pleasantly surprised if they were to run an experiment with more widely distributed commit rights. In any case, it seems likely that growing numbers of developers and patches will put more stress on subsystem maintainers. If those maintainers are not to become a choke point for kernel development, ways to spread the work they do will be required.
[Your editor thanks both the Linux Foundation and Kernel Recipes for supporting his travel to these events.]
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Device drivers
Device driver infrastructure
Documentation
Filesystems and block I/O
Memory management
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>
