Kernel development [LWN.net]

Kernel release status

The current development kernel is 4.2-rc8, released on August 23. In the end, Linus decided to wait one more week before putting out the final 4.2 release. "It's not like there are any real outstanding issues, and I waffled between just doing the release and doing another -rc. But we did have another low-level x86 issue come up this week, and together with the fact that a number of people are on vacation, I decided that waiting an extra week isn't going to hurt. But it was close. It's a fairly small rc8, and I really feel like it could have gone either way."

Previously, 4.2-rc7 came out on August 16.

Stable updates: 4.1.6, 3.14.51, and 3.10.87 were released on August 17.

Comments (none posted)

The bcachefs filesystem

Kent Overstreet, author of the bcache block caching layer, has announced that bcache has metamorphosed into a fully featured copy-on-write filesystem. "Well, years ago (going back to when I was still at Google), I and the other people working on bcache realized that what we were working on was, almost by accident, a good chunk of the functionality of a full blown filesystem - and there was a really clean and elegant design to be had there if we took it and ran with it. And a fast one - the main goal of bcachefs to match ext4 and xfs on performance and reliability, but with the features of btrfs/zfs."

Comments (94 posted)

The bcachefs filesystem

By Jonathan Corbet
August 25, 2015

The Linux kernel does not lack for filesystem support; many dozens of filesystem implementations are available for one use case or another. But, after all these years, Linux arguably lacks an established "next-generation" filesystem with advanced features and a design suited to contemporary hardware. That situation holds despite the existence of a number of competitors for that title; Btrfs remains at the top of the list, but others, such as tux3 and (still!) reiser4, are out there as well. In each case, it has taken rather longer than expected for the code to reach the required level of maturity. The list of putative next-generation filesystems has just gotten longer with the recent announcement of the "bcachefs" filesystem.

Bcachefs is an extension of bcache, which first appeared in LWN in 2010. Bcache was designed as a caching layer that improves block I/O performance by using a fast solid-state drive as a cache for a (slower, larger) underlying storage device. Bcache has been steadily developed over the last five years; it was merged into the mainline kernel during the 3.10 development cycle in 2013.

Mainline bcache is not a filesystem; instead, it looks like a special kind of block device. It manages the movement of blocks of data between fast and slow storage, working to ensure that the most frequently used data is kept on the faster device. This task is complex; bcache must manage data in a way that yields high performance while ensuring that no data is ever lost, even in the face of an unclean shutdown. Even so, at its interface to the rest of the system, bcache looks like a simple block device: give it numbered blocks of data, and it will store (and retrieve) them.

Users typically want something a bit higher-level than that; they want to be able to organize blocks into files, and files into directory hierarchies. That task is handled by a filesystem like ext4 or Btrfs. Thus, on current systems, bcache will be used in conjunction with a filesystem layer to provide a complete solution.

It seems that, over time, bcache has developed the potential to provide filesystem functionality on its own. In the bcachefs announcement, Kent Overstreet said:

Well, years ago (going back to when I was still at Google), I and the other people working on bcache realized that what we were working on was, almost by accident, a good chunk of the functionality of a full blown filesystem - and there was a really clean and elegant design to be had there if we took it and ran with it.

The actual running with this idea appears to have happened relatively recently, with the first publicly visible version of the bcachefs code being committed to the bcache repository in May 2015. Since then, it has seen a steady stream of commits from Kent; it was announced on the bcache mailing list in mid-July, and on linux-kernel just over a month later.

With the bcachefs code added, bcache has gained the namespace and file-management features that, until now, had to be supplied by a separate filesystem layer. Like Btrfs, it is a copy-on-write filesystem, meaning that data is never overwritten. Instead, a block that is overwritten moves to a new location, with the older version persisting as long as any references to it remain. Copy-on-write works well on solid-state storage devices and makes a number of advanced features relatively easy to implement.

Since the original bcache was a block-device management layer, bcachefs has some strong features in this area. Naturally, it offers multi-tier hybrid caching of data, and is able to integrate multiple physical devices into a single logical volume. Bcachefs does not appear to have any sort of higher-level RAID capability at this time, though; a basic replication mechanism is "like 80% done". Features like data checksumming and compression are supported.

The plans for the future include filesystem features like snapshots — an important Btrfs feature that is not yet available in bcachefs. Kent listed erasure coding as well, presumably as an alternative to higher-level RAID support. Native support for shingled magnetic recording drives is on the list, as is support for working with raw flash storage directly.

But none of those features are present in bcachefs now; work has been focused on getting the basic filesystem working in a reliable manner. Performance tuning has not been a priority thus far, but the filesystem claims reasonable performance numbers already — though, as Kent admitted, it suffers from the common (to copy-on-write filesystems) problem of "filling up" well before the underlying storage is actually filled with data. Importantly, the on-disk filesystem format has not yet been finalized — a clear sign that a filesystem is not yet ready for real-world use.

Another important (though unlisted) missing feature is a filesystem integrity checker ("fsck") utility.

Bcachefs looks like a promising filesystem, even if many of the intended features have not yet been implemented. But those who have watched filesystem development for any period of time will know what comes next: a surprisingly long wait while the code matures to the point that it can actually be trusted for production workloads. This process, it seems, cannot be hurried beyond a certain point; that is why other next-generation filesystem efforts are seemingly never quite ready. The low-level device-management code in bcachefs is tested and production-quality, but the filesystem code lacks that pedigree. Kent says that it "won't be done in a month (or a year)", but the truth is that it may not be done for several years yet; that is how filesystem development tends to go.

How many years depends, of course, on how many people test the filesystem and how much development effort it gets. Currently it has a development community of one — Kent — and he has noted that his full-time attention is "only going to last as long as my interest and my savings account hold out". If bcachefs acquires both a commercial sponsor and a wider development community, it may yet develop into that mature next-generation filesystem that we seem to never quite get (though Btrfs is there by some accounts). Until that happens, it should probably be looked at as an interesting idea with some advanced proof-of-concept code.

Comments (7 posted)

Steps toward power-aware scheduling

By Jonathan Corbet
August 25, 2015

Power-aware scheduling appears to have become one of those perennial linux-kernel topics that never quite reach a conclusion. Nobody disputes the existence of a problem to be solved, and potential solutions are not in short supply. But somehow none of those solutions ever quite makes it to the point of being ready for incorporation into the mainline scheduler. A few new patch sets showing a different approach to the problem have made the rounds recently. They may not be ready for merging either, but they do show how the understanding of the problem is evolving.

A sticking point in recent years has been the fact that there are a few subsystems related to power management and scheduling, and they are poorly integrated with each other. The cpuidle subsystem makes guesses about how deeply an idle CPU should sleep, but it does so based on recent history and without a view into the system's current workload. The cpufreq mechanism tries to observe the load on each CPU to determine the frequency and voltage the CPU should be operating at, but it doesn't talk to the scheduler at all. The scheduler, in turn, has no view of a CPU's operating parameters and, thus, cannot make optimal scheduling decisions.

It has become clear that this scattered set of mechanisms needs to be cleaned up before meaningful progress can be made on the current problem set. The scheduler maintainers have made it clear that they won't be interested in solutions that don't bring the various control mechanisms closer together.

Improved integration

One possible part of the answer is this patch set from Michael Turquette, currently in its third revision. Michael's patch replaces the current array of cpufreq governors with a new governor that is integrated with the scheduler. In essence, the scheduler occasionally calls directly into the governor, passing it a value describing the load that, the scheduler thinks, is currently set to run on the CPU. The governor can then select a frequency/voltage pair that enables the CPU to execute that load most efficiently.

The projected load on each CPU is generated by the per-entity load tracking subsystem. Since each process has its own tracked load, the scheduler can quickly sum up the load presented by all of the runnable processes on a CPU and pass that number on to the governor. If a process changes its state or is moved to another CPU, the load values can be updated immediately. That should make the new governor much more responsive than current governors, which must observe the CPU for a while to determine that a change needs to be made.

The per-entity load tracking code was a big step forward when it was added to the scheduler, but it still has some shortcomings. In particular, its concept of load is not tied to the CPU any given process might be running on. If different CPUs are running at different frequencies, the loads computed for processes on those CPUs will not be comparable. The problem gets worse on systems (like those based on the big.LITTLE architecture) where some CPUs are inherently more powerful than others.

The solution to this problem appears to be Morten Rasmussen's compute-capacity-invariant load/utilization tracking patch set. With these patches applied, all load and utilization values calculated by the scheduler are scaled relative to the current CPU capacity. That makes these values uniform across the system, allowing the scheduler to better judge the effects of moving a process from one CPU to another. It also will clearly help the power-management problem: matching CPU capacity to the projected load will work better if the load values are well-calibrated and understood.

With those two patch sets in place, the scheduler will be better equipped to run the system in a relatively power-efficient manner (though related issues like optimal task placement have not yet been addressed here). In the real world, though, not everybody wants to run in the most efficient mode all the time. Some systems may be managed more for performance than for power efficiency; the desired policy on other systems may vary depending on what jobs are running at the time. Linux currently supports a number of CPU-frequency governors designed to implement different policies; if the scheduler-driven governor is to replace all of those, it, too, must be able to support multiple policies.

Schedtune

One possible step in that direction can be seen in this patch set from Patrick Bellasi. It adds a tuning mechanism to the scheduler-driven governor so that multiple policies become possible. At its simplest, this tuning takes the form of a single, global value, stored in /proc/sys/kernel/sched_cfs_boost. The default value for this parameter is zero, which indicates that the system should be run for power efficiency. Higher values, up to 100, bias CPU frequency selection toward performance.

The exact meaning of this knob is fairly straightforward. At any given time, the scheduler can calculate the CPU capacity that it expects the currently runnable processes to require. The space between that capacity and the maximum capacity the CPU can provide is called the "margin." A non-zero value of sched_cfs_boost describes the percentage of the margin that should be made available via a more aggressive CPU-frequency/voltage selection.

So, for example, if the current load requires a CPU running at 60% capacity, the margin is 40%. Setting sched_cfs_boost to 50 will cause 50% of that margin to be made available, so the CPU should run at 80% of its maximum capacity. If sched_cfs_boost is set to 100, the CPU will always run at its maximum speed, optimizing the system as a whole for performance.

What about situations where the desired policy varies over time? A phone handset may want to run with higher performance while a phone call is active or when the user is interacting with the screen, but in the most efficient mode possible while checking for the day's obligatory pile of app updates. One could imagine making the desired power policy a per-process attribute, but Patrick, instead, opted to use the control-group mechanism instead.

With Patrick's patch set comes a new controller called "schedtune". That controller offers a single knob, called schedtune.boost, to describe the policy that should apply to processes within the group. One possible implementation would be to change the CPU's operating parameters every time a new process starts running, but there are a couple of problems with that approach. It could lead to excessive changing of CPU frequency and voltage, which can be counterproductive. Beyond that, though, a process needing high performance could find itself waiting behind another that doesn't; if the CPU runs slowly during that wait, the high-performance process may not get the response time it needs.

To avoid such problems, the controller looks at all running processes on the CPU and finds the one with the largest boost value. That value is then used to run all processes on the CPU.

The schedtune controller as currently implemented has a couple of interesting limitations. It can only handle a two-level control group hierarchy, and it can manage a maximum of sixteen possible groups. Neither of these characteristics fits well with the new, unified-hierarchy model for control groups, so the schedtune controller is highly likely to require modification before this patch set could be considered for merging into the mainline.

But, then, experience says that eventual merging may be a distant prospect in any case. The scheduler must work well for a huge variety of workloads, and cannot be optimized for one at the expense of others. Finding a way to add power awareness to the scheduler in a way that works for all workloads was never going to be an easy task. The latest patches show that progress is being made toward a general-purpose solution that, with luck, leaves the scheduler more flexible and maintainable than before. But whether that progress is reaching the point of being a solution that can be merged remains to be seen.

Comments (14 posted)

Porting Linux to a new processor architecture, part 1: The basics

August 26, 2015

This article was contributed by Joël Porquet

Although a simple port may count as little as 4000 lines of code—exactly 3,775 for the mmu-less Hitachi 8/300 recently reintroduced in Linux 4.2-rc1—getting the Linux kernel running on a new processor architecture is a difficult process. Worse still, there is not much documentation available describing the porting process. The aim of this series of three articles is to provide an overview of the procedure, or at least one possible procedure, that can be followed when porting the Linux kernel to a new processor architecture.

After spending countless hours becoming almost fluent in many of the supported architectures, I discovered that a well-defined skeleton shared by the majority of ports exists. Such a skeleton can logically be split into two parts that intersect a great deal. The first part is the boot code, meaning the architecture-specific code that is executed from the moment the kernel takes over from the bootloader until init is finally executed. The second part concerns the architecture-specific code that is regularly executed once the booting phase has been completed and the kernel is running normally. This second part includes starting new threads, dealing with hardware interrupts or software exceptions, copying data from/to user applications, serving system calls, and so on.

Is a new port necessary?

As LWN reported about another porting experience in an article published last year, there are three meanings to the word "porting".

It can be a port to a new board with an already-supported processor on it. Or it can be a new processor from an existing, supported processor family. The third alternative is to port to a completely new architecture.

Sometimes, the answer to whether one should start a new port from scratch is crystal clear—if the new processor comes with a new instruction set architecture (ISA), that is usually a good indicator. Sometimes it is less clear. In my case, it took me a couple weeks to figure out this first question.

At the time, May 2013, I had just been hired by the French academic computer lab LIP6 to port the Linux kernel to TSAR, an academic processor architecture that the system-on-chip research group was designing. TSAR is an architecture that follows many of the current trends: lots of small, single-issue, energy-efficient processor cores around a scalable network-on-chip. It also adds some nice innovations: a full-hardware cache-coherency protocol for both data/instruction caches and translation lookaside buffers (TLBs) as well as physically distributed but logically shared memory.

My dilemma was that the processor cores were compatible with the MIPS32 ISA, which meant the port could fall into the second category: "new processor from an existing processor family". But since TSAR had a virtual-memory model radically different from those of any MIPS processors, I would have been forced to drastically modify the entire MIPS branch in order to introduce this new processor, sometimes having almost no choice but to surround entire files with #ifndef TSAR ... #endif.

Quickly enough, it came down to the most logical—and interesting—conclusion:

    mkdir linux/arch/tsar

Get to know your hardware

Really knowing the underlying hardware is definitely the fundamental, and perhaps most obvious, prerequisite to porting Linux to it.

The specifications of a processor are often—logically or physically—split into a least two parts (as were, for example, the recently published specifications for the new RISC-V processor). The first part usually details the user-level ISA, which basically means the list of user-level instructions that the processor is able to understand—and execute. The second part describes the privileged architecture, which includes the list of kernel-level-only instructions and the various system registers that control the processor status.

This second part contains the majority—if not the entirety—of the information that makes a port special and thus often prevents the developer from opportunely reusing code from other architectures.

Among the important questions that should be answered by such specifications are:

What are the virtual-memory model of the processor architecture, the format of the page table, and the translation mechanism?

Many processor architectures (e.g. x86, ARM, or TSAR) define a flexible virtual-memory layout. Their virtual address space can theoretically be split any way between the user and kernel spaces—although the default layout for 32-bit processors in Linux usually allocates the lower 3GiB to user space and reserves the upper 1GiB for kernel space. In some other architectures, this layout is strongly constrained by the hardware design. For instance, on MIPS32, the virtual address space is statically split into two regions of the same size: the lower 2GiB is dedicated to user space and the upper 2GiB to kernel space; the latter even contains predefined windows into the physical address space.

The format of the page table is intimately linked to the translation mechanism used by the processor. In the case of a hardware-managed mechanism, when the TLB—a hardware cache of limited size containing recently used translations between virtual and physical addresses—does not contain the translation for a given virtual address (referred to as TLB miss), a hardware state machine will transparently fetch the proper translation from the page table structure in memory and fill the TLB with it. This means that the format of the page table must be fixed—and certainly defined by the processor's specifications. In a software-based mechanism, a TLB miss exception is handled by a piece of code, which theoretically leaves complete liberty as to how the page table is organized—only the format of TLB entries is specified.
How to enable/disable the interrupts, switch from privileged mode to user mode and vice-versa, get the cause of an exception, etc.?

Although all these operations generally only involve reading and/or modifying certain bit fields in the set of available system registers, they are always very particular to each architecture. It is for this reason that, most of the time, they are actually performed by small chunks of dedicated assembly code.
What is the ABI?

Although one could think that the Application Binary Interface (ABI) is only supposed to concern compilation tools, as it defines the way the stack is formatted into stack-frames, the ways arguments and return values are given or returned by functions, etc.; it is actually absolutely necessary to be familiar with it when porting Linux. For example, as the recipient of system calls (which are typically defined by the ABI), the kernel has to know where to get the arguments and how to return a value; or on a context switch, the kernel must know what to save and restore, as well as what constitutes the context of a thread, and so on.

Get to know the kernel

Learning a few kernel concepts, especially concerning the memory layout used by Linux, will definitely help. I admit it took me a while to understand what exactly was the distinction between low memory and high memory, and between the direct mapping and vmalloc regions.

For a typical and simple port (to a 32-bit processor), in which the kernel occupies the upper 1GiB of the virtual address space, it is usually fairly straightforward. Within this 1GiB, Linux defines that the lower portion of it will be directly mapped to the lower portion of the system memory (hence referred to as low memory): meaning that if the kernel accesses the address 0xC0000000, it will be redirected to the physical address 0x00000000.

In contrast, in systems with more physical memory than that which is mappable in the direct mapping region, the upper portion of the system memory (referred to as high memory) is not normally accessible to the kernel. Other mechanisms must be used, such as kmap() and kmap_atomic(), in order to gain temporary access to these high-memory pages.

Above the direct mapping region is the vmalloc region that is controlled by vmalloc(). This allocation mechanism provides the ability to allocate pages of memory in a virtually contiguous way in spite of the fact that these pages may not necessarily be physically contiguous. It is particularly useful for allocating a large amount of memory pages in a virtually contiguous manner, as otherwise it can be impossible to find the equivalent amount of contiguous free physical pages.

Further reading about the memory management in Linux can be found in Linux Device Drivers [PDF] and this LWN article.

How to start?

With your head full of the processor's specifications and kernel principles, it is finally time to add some files to this newly created arch directory. But wait ... where and how should we start? As with any porting or even any code that must respect a certain API, the procedure is a two-step process.

First, a minimal set of files that define a minimal set of symbols (functions, variables, defines) is necessary for the kernel to even compile. This set of files and symbols can often be deduced from compilation failures: if compilation fails because of a missing file/symbol, it is a good indicator that it should probably be implemented (or sometimes that some configuration options should be modified). In the case of porting Linux, this approach is particularly relevant when implementing the numerous headers that define the API between the architecture-specific code and the rest of the kernel.

After the kernel finally compiles and is able to be executed on the target hardware, it is useful to know that the boot code is very sequential. That allows many functions to stay empty at first and to only be implemented gradually until the system finally becomes stable and reaches the init process. This approach is generally possible for almost all of the C functions executed after the early assembly boot code. However it is advised to have the early_printk() infrastructure up and working otherwise it can be difficult to debug.

Finally getting started: the minimal set of non-code files

Porting the compilation tools to the new processor architecture is a prerequisite to porting the Linux kernel, but here we'll assume it has already been performed. All that is left to do in terms of compilation tools is to build a cross-compiler. Since at this point it is likely that porting a standard C library has not been completed (or even started), only a stage-1 cross-compiler can be created.

Such a cross-compiler is only able to compile source code for bare metal execution, which is a perfect fit for the kernel since it does not depend on any external library. In contrast, a stage-2 cross-compiler has built-in support for a standard C library.

The first step of porting Linux to a new processor is the creation of a new directory inside arch/, which is located at the root of the kernel tree (e.g. linux/arch/tsar/ in my case). Inside this new directory, the layout is quite standardized:

configs/: default configurations for supported systems (i.e. *_defconfig files)
include/asm/ for the headers dedicated to internal use only, i.e. Linux source code
include/uapi/asm for the headers that are meant to be exported to user space (e.g. the libc)
kernel/: general kernel management
lib/: optimized utility routines (e.g. memcpy(), memset(), etc.)
mm/: memory management

The great thing is that once the new arch directory exists, Linux automatically knows about it. It only complains about not finding a Makefile, not about this new architecture:

    ~/linux $ make ARCH=tsar
    Makefile: ~/linux/arch/tsar/Makefile: No such file or directory

As shown in the following example, a minimal arch Makefile only has a few variables to specify:

    KBUILD_DEFCONFIG := tsar_defconfig

    KBUILD_CFLAGS += -pipe -D__linux__ -G 0 -msoft-float
    KBUILD_AFLAGS += $(KBUILD_CFLAGS)

    head-y := arch/tsar/kernel/head.o

    core-y += arch/tsar/kernel/
    core-y += arch/tsar/mm/

    LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
    libs-y += $(LIBGCC)
    libs-y += arch/tsar/lib/

    drivers-y += arch/tsar/drivers/

KBUILD_DEFCONFIG must hold the name of a valid default configuration, which is one of the defconfig files in the configs directory (e.g. configs/tsar_defconfig).
KBUILD_CFLAGS and KBUILD_AFLAGS define compilation flags, respectively for the compiler and the assembler.
{head,core,libs,...}-y list the objects (or subdirectory containing the objects) to be compiled in the kernel image (see Documentation/kbuild/makefiles.txt for detailed information)

Another file that has its place at the root of the arch directory is Kconfig. This file mainly serves two purposes: it defines new arch-specific configuration options that describe the features of the architecture, and it selects arch-independent configuration options (i.e. options that are already defined elsewhere in Linux source code) that apply to the architecture.

As this will be the main configuration file for the newly created arch, its content also determines the layout of the menuconfig command (e.g. make ARCH=tsar menuconfig). It is difficult to give a snippet of the file as it depends very much on the targeted architecture, but looking at the same file for other (simple) architectures should definitely help.

The defconfig file (e.g. configs/tsar_defconfig) is necessary to complete the files related to the Linux kernel build system (kbuild). Its role is to define the default configuration for the architecture, which basically means specifying a set of configuration options that will be used as a seed to generate a full configuration for the Linux kernel compilation. Once again, starting from defconfig files of other architectures should help, but it is still advised to refine them, as they tend to activate many more features than a minimalistic system would ever need—support for USB, IOMMU, or even filesystems is, for example, too early at this stage of porting.

Finally the last "not really code but still really important" file to create is a script (usually located at kernel/vmlinux.lds.S) that will instruct the linker how to place the various sections of code and data in the final kernel image. For example, it is usually necessary for the early assembly boot code to be set at the very beginning of the binary, and it is this script that allows us do so.

Conclusion

At this point, the build system is ready to be used: it is now possible to generate an initial kernel configuration, customize it, and even start compiling from it. However, the compilation stops very quickly since the port still does not contain any code.

In the next article, we will dive into some code for the second portion of the port: the headers, the early assembly boot code, and all the most important arch functions that are executed until the first kernel thread is created.

Comments (none posted)

Development statistics for the 4.2 kernel

By Jonathan Corbet
August 18, 2015

As of this writing, the 4.2-rc7 prepatch is out and the final 4.2 kernel looks to be (probably) on-track to be released on August 23. Tradition says that it's time for a look at the development statistics for this cycle. 4.2, in a couple of ways, looks a bit different from recent cycles, with some older patterns reasserting themselves.

At the end of the merge window, there was some speculation as to whether 4.2 would be the busiest development cycle yet. The current record holder is 3.15, which had 13,722 non-merge changesets at the time of its final release. 4.2, which had 13,555 at the -rc7 release, looks to fall a little short of that figure. So we will not have broken the record for the most changesets in any development cycle, but it was awfully close.

One record that did fall, though, is the number of developers contributing code to the kernel. The previous record holder (4.1, at 1,539) didn't keep that position for long; 1,569 developers have contributed to 4.2. Of those developers, 279 have made their first contribution to the Linux kernel. An eye-opening 1.09 million lines of code were added this time around with 285,000 removed, for a total growth of 800,000 lines of code.

The most active developers this time around were:

Most active 4.2 developers

By changesets

Ingo Molnar 304 2.2%

Mauro Carvalho Chehab 203 1.5%

Herbert Xu 171 1.3%

Krzysztof Kozlowski 161 1.2%

Geert Uytterhoeven 149 1.1%

Al Viro 140 1.0%

Lars-Peter Clausen 137 1.0%

H Hartley Sweeten 136 1.0%

Thomas Gleixner 127 0.9%

Hans Verkuil 124 0.9%

Tejun Heo 110 0.8%

Alex Deucher 95 0.7%

Paul Gortmaker 91 0.7%

Vineet Gupta 88 0.7%

Jiang Liu 84 0.6%

Christoph Hellwig 79 0.6%

Hans de Goede 78 0.6%

Arnaldo Carvalho de Melo 77 0.6%

Mateusz Kulikowski 74 0.5%

Takashi Iwai 73 0.5%

By changed lines

Alex Deucher 425501 35.7%

Johnny Kim 33726 2.8%

Raghu Vatsavayi 14484 1.2%

Greg Kroah-Hartman 12500 1.0%

Stephen Boyd 11062 0.9%

Dan Williams 10736 0.9%

Hans Verkuil 10641 0.9%

Narsimhulu Musini 10263 0.9%

Ingo Molnar 9254 0.8%

Jakub Kicinski 8531 0.7%

Herbert Xu 8515 0.7%

Yoshinori Sato 7612 0.6%

Saeed Mahameed 7493 0.6%

Sunil Goutham 7471 0.6%

Christoph Hellwig 7384 0.6%

Vineet Gupta 7171 0.6%

Mateusz Kulikowski 6852 0.6%

Maxime Ripard 6767 0.6%

Sudeep Dutt 6647 0.6%

Mauro Carvalho Chehab 6422 0.5%

Some years ago, Ingo Molnar routinely topped the per-changesets list, but he has been busy with other pursuits recently. That changed this time around, though, with a massive rewrite of the low-level x86 floating-point-unit management code. Mauro Carvalho Chehab continues to be an active maintainer of the media subsystem, and Herbert Xu's work almost entirely reflects his role as the maintainer of the kernel's crypto subsystem. Krzysztof Kozlowski contributed cleanups throughout the driver subsystem, and Geert Uytterhoeven, despite being the m68k architecture maintainer, did most of his work within the ARM tree and related driver subsystems.

On the "lines added" side, Alex Deucher accounted for nearly half of the entire growth of the kernel this time around with the addition of the new amdgpu graphics driver. Johnny Kim added the wilc1000 network driver to the staging tree, Raghu Vatsavayi added support for Cavium Liquidio Ethernet adapters, Greg Kroah-Hartman removed the obsolete i2o subsystem, and Stephen Boyd removed a bunch of old driver code while adding driver support for QCOM SPMI regulators and more.

The top contributor statistics in recent years have often been dominated by developers generating lots of cleanup patches or reworking staging drivers. One might expect to see a lot of that activity in an especially busy development cycle, but that is not the case for 4.2. Instead, the top contributors include many familiar names and core contributors. One might be tempted to think that the cleanup work is finally approaching completion, but one would be highly likely to be disappointed in future development cycles.

The most active companies supporting development in the 4.2 cycle (of 236 total) were:

Most active 4.2 employers

By changesets

Intel 1665 12.3%

Red Hat 1639 12.1%

(Unknown) 884 6.5%

(None) 884 6.5%

Samsung 681 5.0%

SUSE 496 3.7%

Linaro 449 3.3%

(Consultant) 412 3.0%

IBM 391 2.9%

AMD 286 2.1%

Google 246 1.8%

Renesas Electronics 203 1.5%

Free Electrons 203 1.5%

Texas Instruments 191 1.4%

Facebook 176 1.3%

Oracle 163 1.2%

Freescale 156 1.2%

ARM 145 1.1%

Cisco 142 1.0%

Broadcom 138 1.0%

By lines changed

AMD 438094 36.8%

Intel 96331 8.1%

Red Hat 62959 5.3%

(None) 46140 3.9%

(Unknown) 41886 3.5%

Atmel 34942 2.9%

Samsung 29326 2.5%

Linaro 22714 1.9%

Cisco 21170 1.8%

SUSE 18891 1.6%

Code Aurora Forum 18435 1.5%

Mellanox 18044 1.5%

(Consultant) 15234 1.3%

IBM 15095 1.3%

Cavium Networks 14580 1.2%

Free Electrons 13640 1.1%

Unisys 13428 1.1%

Linux Foundation 12617 1.1%

MediaTek 11856 1.0%

Google 11811 1.0%

Once again, there are few surprises here. At 6.5%, the percentage of changes coming from volunteers is at its lowest point ever. AMD, unsurprisingly, dominated the lines-changed column with the addition of the amdgpu driver. Beyond that, it is mostly the usual companies supporting kernel development in the usual way.

The kernel community depends heavily on its testers and bug reporters; at least some of the time, their contribution is recorded as Tested-by and Reported-by tags in the patches themselves. In the 4.2 development cycle, 946 Tested-by credits were placed in 729 patches, and 611 Reported-by credits were placed in 682 patches. The most active contributors in this area were:

Most active 4.2 testers and reporters

Tested-by credits

Joerg Roedel 40 4.2%

Keita Kobayashi 35 3.7%

Krishneil Singh 31 3.3%

Arnaldo Carvalho de Melo 30 3.2%

Ira Weiny 24 2.5%

Doug Ledford 23 2.4%

Alex Ng 22 2.3%

Aaron Brown 21 2.2%

Javier Martinez Canillas 19 2.0%

ZhenHua Li 19 2.0%

Reported-by credits

Wu Fengguang 76 11.1%

Dan Carpenter 41 6.0%

Russell King 23 3.4%

Ingo Molnar 12 1.8%

Stephen Rothwell 10 1.5%

Linus Torvalds 8 1.2%

Hartmut Knaack 7 1.0%

Huang Ying 6 0.9%

Christoph Hellwig 5 0.7%

Sudeep Holla 5 0.7%

The power of Wu Fengguang's zero-day build robot can be seen here; it resulted in 11% of all of the credited bug reports in this development cycle. The work of all of the kernel's testers and bug reporters leads to a more stable kernel release for everybody. The biggest concern with these numbers, perhaps, is that we might still not be doing a thorough job of documenting the contribution of all of our testers and reporters.

All told, the kernel development community continues to run like a well-tuned machine, producing stable kernel releases on a predictable (and fast) schedule. Back in 2010, your editor worried that the community might be headed toward another scalability crisis, but such worries have proved to be unfounded, for now at least. There must certainly be limits to the volume of change that can be managed by the current development model, but we do not appear to have reached them yet.

Comments (6 posted)

Linus Torvalds .. one more last -rc after all ?

Linus Torvalds Linux 4.2-rc7 ?

Greg KH Linux 4.1.6 ?

Sebastian Andrzej Siewior 4.1.5-rt5 ?

Luis Henriques Linux 3.16.7-ckt16 ?

Greg KH Linux 3.14.51 ?

Greg KH Linux 3.10.87 ?

Ben Hutchings Linux 3.2.71 ?

Suzuki K. Poulose arm64: 16K translation granule support ?

Aneesh Kumar K.V KASan ppc64 support ?

Jiang Liu Enable memoryless node support for x86 ?

Sukadev Bhattiprolu perf: Implement group-read of events using txn interface ?

Josh Poimboeuf Compile-time stack validation ?

Morten Rasmussen sched/fair: Compute capacity invariant load/utilization tracking ?

Patrick Bellasi sched: Central, scheduler-driven, power-perfomance control ?

John Kacur rt-tests-0.93 ?

Roy Pledge Freescale DPAA QBMan Drivers ?

Eric Anholt Raspberry Pi KMS-only driver ?

Kenneth Lee net: Hisilicon Network Subsystem support ?

Leo Yan mailbox: hisilicon: add Hi6220 mailbox driver ?

Hongtao Wu Add MMC host driver for Spreadtrum SoC ?

Pi-Cheng Chen Add Mediatek MT8173 cpufreq driver ?

YH Huang Add MediaTek display PWM driver ?

Sascha Hauer Add Mediatek thermal support ?

MaJun Support Mbigen interrupt controller ?

Chen-Yu Tsai ARM: sunxi: Add Reduced Serial Bus support ?

Archit Taneja mtd: Qualcomm NAND controller driver ?

Yakir Yang Add Analogix Core Display Port Driver ?

Vladimir Barinov iio: adc: hi8435: Add Holt HI-8435 threshold detector ?

Robert Baldyga nfc: Add driver for Samsung S3FWRN5 NFC Chip ?

vndao@altera.com [PATCH v5] mtd:spi-nor: Add Altera Quad SPI Driver ?

Sanchayan Maity Add support for touchscreen on Colibri VF50 ?

Yann Cantin Add a new USB eBeam input driver ?

Taku Izumi FUJITSU Extended Socket network device driver ?

Pankaj Dubey Add support for Exynos SROM Controller driver ?

Cyrille Pitchen add driver for Atmel QSPI controller ?

Qais Yousef Add support for img AXD audio hardware decoder ?

fu.wei@linaro.org Watchdog: introduce ARM SBSA watchdog driver ?

Lee Jones remoteproc: Add driver for STMicroelectronics platforms ?

Dan Williams 'struct page' driver for persistent memory ?

Dan Williams memremap for 4.3 ?

Dan Williams [PATCH v5 0/5] introduce __pfn_t for unmapped pfn I/O and DAX lifetime ?

Christoph Hellwig provide more common DMA API functions ?

atull@opensource.altera.com FPGA Manager Framework and Simple FPGA Bus ?

Baolin Wang Introduce usb charger framework to deal with the usb gadget power negotation ?

Mauro Carvalho Chehab Changes on MC core due to MC workshop discussion ?

Jens Wiklander generic TEE subsystem ?

Hans Verkuil HDMI CEC framework ?

Roger Quadros USB: OTG/DRD Core functionality ?

Alexander Holler deps: deterministic driver initialization order ?

Mauro Carvalho Chehab Document the kABI for the media subsystem ?

Mauro Carvalho Chehab [ANNOUNCE] Report for the Media Controller Workshop - Espoo - Aug, 17 2015 ?

David Drysdale fs: add O_BENEATH flag to openat(2) ?

Ming Lei block: loop: improve loop with AIO ?

Kent Overstreet bcachefs - a general purpose COW filesystem ?

Sergey Senozhatsky zram: add zlib compression bckend support ?

Joonsoo Kim zram: introduce crypto compress noctx API and use it on zram ?

Jérôme Glisse HMM (Heterogeneous Memory Management) v10 ?

Jérôme Glisse Implement ODP using HMM v2 ?

Jérôme Glisse HMM anonymous memory migration. ?

Ebru Akagunduz mm: make swapin readahead to gain more thp performance ?

Joonsoo Kim mm/compaction: redesign compaction ?

Mel Gorman Remove zonelist cache and high-order watermark checking v3 ?

Tom Herbert net: Identifier Locator Addressing - Part I ?

David Ahern VRF-lite - v6 ?

Joe Stringer OVS conntrack support ?

Willem de Bruijn packet: add BPF and eBPF fanout modes ?

Jiri Benc lwtunnel: per route ipv6 support for vxlan ?

Willem de Bruijn socket sendmsg MSG_ZEROCOPY ?

Pravin B Shelar Geneve: Add support for tunnel metadata mode ?

Eric W. Biederman Bind mount escape fixes ?

Andreas Gruenbacher Inode security label invalidation ?

Feng Wu Add VT-d Posted-Interrupts support ?

Marek Olšák libdrm 2.4.63 ?

Masami Hiramatsu perf-probe --cache and SDT support ?

Andi Kleen Announcing simple-pt -- a simple Processor Trace implementation for Linux ?

Kernel development

Brief items

Kernel release status

The bcachefs filesystem

Kernel development news

The bcachefs filesystem

Steps toward power-aware scheduling

Improved integration

Schedtune

Porting Linux to a new processor architecture, part 1: The basics

Is a new port necessary?

Get to know your hardware

Get to know the kernel

How to start?

Finally getting started: the minimal set of non-code files

Conclusion

Development statistics for the 4.2 kernel

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Device driver infrastructure

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers

Miscellaneous