Kernel development [LWN.net]

Kernel release status

The current development kernel is 4.7-rc5, released on June 26. Linus said: "I think things are calming down, although with almost two thirds of the commits coming in since Friday morning, it doesn't feel that way - my Fridays end up feeling very busy. But looking at the numbers, we're pretty much where we normally are at this time of the rc series."

Stable updates: 4.6.3, 4.4.14, and 3.14.73 were released on June 24. Among other things, these updates contain an important security fix.

Comments (none posted)

Quote of the week

There is no "kernel one", there are many client kernels, many different stacks on the Internet, many implementations of TCP. Some of these are poorly engineered, years behind in technology, and otherwise horribly insecure. There are even still people running WIndows95 for heaven's sake! There is simply no way we can just implicitly trust kernels to be doing the right thing. Neither is there any requirement for us to do so, you will not find a requirement in any Internet standard that TCP MUST be implemented in the kernel. The same characteristics hold for middleboxes and firewalls, there are many implementations, many don't follow standards, and the SW/FW upgrade issue is potentially catastrophic to the Internet. We have no requirement and can never assume that a robust sufficient firewall is the in the path of our packets.

Bottom line: if you're developing a business critical application on the Internet, you cannot assume that the OSes or the network provide adequate security; you need to take ownership of security for your application. TOU is a step in that direction.

— Tom Herbert

Comments (2 posted)

Summary of the Glungezer realtime summit

Daniel Wagner has posted a summary writeup from the realtime summit recently held at a remote location in the Alps. "So Christoph [Hellwig] and I teamed up to organise a small conference on a mountain. With Christoph’s familiarity with the Innsbruck mountains, the Glungezer lodge was chosen. The lodge is located almost on the top of the Glungezer summit, which is reachable by no road or cable car. John Kacur voiced the suspicion that Christoph had a secret plan to promote physical fitness in real-time developers by making them hike through the Austrian Alps. As it turns out, he was right."

Comments (none posted)

WireGuard: a new VPN tunnel

Jason Donenfeld has announced the availability of "WireGuard" for the Linux kernel. "WireGuard is an extremely simple yet fast and modern VPN that utilizes state-of-the-art cryptography. It aims to be faster, simpler, leaner, and more useful than IPSec, while avoiding the massive headache. It intends to be considerably more performant than OpenVPN." The code is in an early state and has never been seen by the community before; indeed, it was not posted as a patch series now. As a result, it may be a while before it shows up in a mainline release. For now, the code and more information can be found at wireguard.io.

Full Story (comments: 17)

Virtually mapped stacks 2: thread_info strikes back

By Jonathan Corbet
June 29, 2016

In last week's episode, Andy Lutomirski had posted a set of patches moving kernel stacks to the kernel's vmalloc() area. There are a number of advantages to doing so, including elimination of some high-order memory allocations, improved security, and better diagnostic output when a stack overflow occurs. There was just one little problem: an additional 1.5µs of kernel overhead during process creation, a cost that Linus was unwilling to accept. The first attempt to address that cost ran afoul of an obscure kernel data structure, but the end result looks to be a substantial cleanup of the kernel's management of process information.

The performance regression comes from using vmalloc() to allocate kernel stacks; vmalloc() is a relatively expensive way to allocate memory and has not been subjected to the same sort of optimization work that the slab allocators have seen. One suggestion that had been made was to cache a small number of kernel stacks, allowing the quick reuse of cache-hot stacks after processes exit. The hope was that, by eliminating vmalloc() calls, the code could be made faster in general.

A useless cache

Andy went off to implement this idea and reported a discouraging result: "I implemented a percpu cache, and it's useless." The problem, in short, is that a process's resources (including its kernel stack) are not cleaned up immediately when the process exits. Instead, the read-copy-update (RCU) mechanism is used to ensure that no references to those resources remain before they are freed. That means (1) the freeing of the kernel stack will be delayed until the end of the next RCU grace period, and (2) the resources for all processes that exited during that grace period will be freed together. So the cache for kernel stacks will almost always be empty, then will occasionally be hit with large numbers of released stacks, most of which will not fit into the cache and, thus, will be simply freed. In other words, the cache hit rate, especially with fork-heavy loads, will be close to zero.

In theory there should be no need for a process's kernel stack after that process has died, so one might think that the stack could be released immediately, even if the other data structures need to stay around. The problem is that the core information the kernel maintains about processes lives in two different places:

The massive task_struct structure, found in <linux/sched.h>. This structure, which is (modulo a disturbing number of #ifdef blocks) architecture-independent, contains most of the information the kernel needs to know about a running process.
The small thread_info structure, which is architecture-specific.

The task_struct structure is allocated from the heap like most other kernel data structures. The thread_info structure, though, lives at the bottom of the kernel stack, making it impossible to reuse the kernel stack as long as something might try to reference that structure. For a brief period, Linus pursued changes that would allow the thread_info structure to be freed quickly, even while the task_struct structure persisted, but it quickly became clear that no easy solutions were to be found in that direction. Some information in thread_info, the flags field in particular, can be accessed at surprising times and needs to remain available as long as the kernel has any record of the associated process at all.

The existence of these two structures is something of a historical artifact. In the early days of Linux, only the task_struct existed, and the whole thing lived on the kernel stack; as that structure grew, though, it became too large to store there. But placement on the kernel stack conferred a significant advantage: the structure could be quickly located by masking some bits out of the stack pointer, meaning there was no need to dedicate a scarce register to storing its location. For certain heavily used fields, this was not an optimization that the kernel developers wanted to lose. So, when the task_struct was moved out of the kernel-stack area, a handful of important structure fields were left there, in the newly created thread_info structure. The resulting two-structure solution is still present in current kernels, but it doesn't necessarily have to be that way.

Getting rid of thread_info

In relatively recent times, the kernel has moved to using per-CPU variables to store many types of frequently-needed information. The scheduler caches some crucial process-related information about the currently running process in the per-CPU area; that turns out to be faster than looking at the bottom of the kernel stack. So the amount of useful data stored in struct thread_info has decreased over time. Given that, an obvious question comes to mind: could the thread_info structure be removed altogether? The question is not exactly new; moving struct thread_info away from the kernel stack was one of Andy's objectives from the beginning. But the performance issue gave this question a higher level of urgency.

Linus quickly discovered that some users of the thread_info structure have no need of it, even with no other changes. Mostly they used it to find the task_struct structure — which, typically, they already had a pointer to. He fixed those, and committed the result for the 4.7-rc5 release. This kind of change might not qualify as the sort of bug fix that late-cycle patches are normally restricted to, but he made it clear that he thought they were an acceptable change: "Those are the 'purely legacy reasons for a bad calling convention', and I'm ok with those during the rc series to make it easier for people to play around with this."

He pushed the work further, getting to a point where he could move the (somewhat reduced) thread_info structure off the stack, and embed it within the task_struct instead. That work progressed to where it would boot on a test system; Andy then picked it up and integrated it into his larger patch series.

As of this writing, that series is in its fourth revision. It moves many thread_info fields into task_struct, changing the users of those fields along the way. At the end, the thread_info structure, now containing only the flags field, is moved over to the task_struct as well. Getting there requires a number of changes to the low-level architecture code, so it is an x86-only change at the moment. It seems likely that other architectures will follow along, though; getting rid of the separate thread_info structure is a useful cleanup and security enhancement, even without the rest of the work.

With regard to the real objective of the patch set (moving kernel stacks to the vmalloc() area): the removal of the thread_info structure makes it possible to free the kernel stack as soon as the owning process exits — no RCU grace period required. That, in turn, makes it sensible to add a small per-CPU cache holding up to two free kernel stacks. With the cache, Andy says, the 1.5µs performance regression becomes a 0.5–1µs performance gain.

So, at this point, Andy has a patch series that simplifies some core code, provides immediate detection of kernel-stack overruns, gives better diagnostics when that occurs, improves the security of the kernel, and even makes things run faster. Unsurprisingly, objections are now becoming difficult to find. The only remaining question might be when to merge this code, and the answer appears to be during the 4.8 merge window. That is arguably aggressive, given the fundamental nature of the changes and the fact that there must certainly be a surprise or two still lurking there; indeed, one such surprise is still being worked out as of this writing. But the 4.8 cycle should give time to work through those surprises, and the end result should be worth having.

Comments (10 posted)

Parallel pathname lookups and the importance of testing

June 29, 2016

This article was contributed by Neil Brown

Parallel pathname lookup is a new development that aims to improve some aspects of Linux filesystem performance. It was discussed at the 2016 Linux Storage, Filesystem, and Memory-Management Summit and, as we reported at the time, it required two key changes, both of which have subtle consequences making them worthy of closer examination.

The first of those changes was to introduce a new state for entries in the directory cache (dcache). As well as being positive ("this name does exist") or negative ("this name doesn't currently exist"), they can now be "don't know" or "in-lookup" as it is described in the code. If a dentry (dcache entry) is ever found in this state, the filesystem lookup is still in progress and the caller must wait for it to complete. The design of this change favored performance over simplicity, and the resulting complexity makes bugs harder to see.

The second change was to replace the per-directory mutex with a read/write semaphore that allows read operations to proceed in parallel. While simple in principle, this change has had performance implications that can be educational.

As has been described previously, the dcache allows lookups for cached pathnames to proceed quickly, often with only RCU protection that already allows a high degree of parallelism. The recent work doesn't change this but, instead, handles the case where components of a pathname are not present in the cache. Prior to Linux 4.7-rc1, a per-directory mutex would be held while looking up any name in that directory. For a small directory in a local filesystem, this forced serialization is unlikely to be a problem; looking up one file name is likely to bring the directory block containing the name into the page cache, from which subsequent lookups can be performed with no further delay. For large directories, or directories on network-attached filesystems, it is more likely that every directory access will incur a non-trivial latency and the serialization imposed by the mutex can hurt.

While parallel lookups within a single directory make sense, parallel lookups of a single name do not. Thus, the two changes mentioned can be described as adding per-name locking, and then removing per-directory locking, for lookups at least. The "don't know" state for a dentry could also be described as a "locked" dentry.

The idea of a cache lookup returning an incomplete (but locked) object is far from new. It was in 2002 that Linux 2.6.12 gained the iget_locked() interface that allows the reading of an inode from disk to be separated from the task of adding the inode to the icache (inode cache). At a coarse level, what we are now seeing is the same improvement being added to the dcache. Looking up names in the dcache happens far more frequently than looking up inodes in the icache, so, given that hotter paths tend to be more heavily optimized, it shouldn't be surprising that the dcache version is not as straightforward as the icache version.

A "don't know" state for dcache entries

The sequence of steps for a lookup with the possibility of "don't know" entries is conceptually straightforward:

See if the object is already in the cache
If not:
1. allocate a new object, flagged as "locked"
2. repeat the lookup, but this time insert the new object if none was found
If an existing object was found, free the new version (if we allocated it), then wait if the found object is locked
If no existing object was found, initialize the new object completely and unlock it, waking up any process waiting for it

All of these steps can be seen in the new code, particularly in d_alloc_parallel(), which covers 2a, 2b, and 3. Step 4 can be found in lookup_slow(). Step 1 is separate; it is part of the "fast path" executed when everything is in cache. It is embodied in various calls to lookup_fast(), such as the one in walk_component(). The main source of extra complexity in this code is that a new hash table has been introduced to hold the "in-lookup" dentries. The primary hash table, dentry_hashtable, only holds entries on which lookup has completed and are thus known to be positive or negative; entries are added to the new in_lookup_hash using a separate linkage field (d_u.d_in_lookup_hash) in the dentry so that it can be transiently in both tables. When filesystem lookup completes, the entry is added to the primary hash table and then removed from the in-lookup hash table.

The lookup in step 2b needs to look in the primary hash table and then the in-lookup hash table, and it needs to be careful of possible races with the entry being moved from the latter to the former once lookup completes. To enable detection of these races a new "bit-seq-lock" is introduced — like a seqlock but with a single bit used as the spinlock.

The value of the secondary hash table is that it allows the insertion of new entries without the need to search the hash chain in the primary table under an exclusive lock. An exclusive lock (obtained with hlist_bl_lock()) is needed to search the hash chain in the secondary table, but that can be expected to be a much shorter chain that is accessed much less often. The exclusive lock on the primary hash chain is only held long enough to attach the dentry once it is ready.

With these concerns in mind, step 2b above can be expanded to:

Find the current value of the new per-directory bit-seq-lock
Search the primary hash table with only RCU protection — exit if found
Get an exclusive lock on the in_lookup_hash chain
Check whether the bit-seq-lock has changed. If it has, retry from A. If it hasn't, then we have not yet raced with the target dentry being moved between tables, and the lock we just took will stop the race from happening after this point
Search the in_lookup_hash chain; if nothing is found, insert the new entry that was allocated in 2a

If the newly allocated dentry was inserted, a waitqueue provided by the caller is stored in the entry, in otherwise unused space, so a wakeup can be sent when the dentry is ready. If an existing, in-lookup dentry was found, then d_alloc_parallel() waits on that waitqueue for the wakeup, and then double checks to ensure that the dentry still looks correct: as no locks were held while waiting, the dentry could already have been renamed or unlinked.

With this understanding, it becomes possible to look through d_alloc_parallel() and most of it starts to make sense, though a particularly critical eye might notice

    if (d_unhashed(dentry))
        continue;

in the middle of the loop performing the search in in_lookup_hash. A similar test appears in other loops that search in the primary hash table, so it is only surprising if you happen to remember that the two hash tables use different linkages and, as this function tests the linkage for the primary hash table, it really doesn't belong here.

This strangeness is particularly easy to notice with hindsight once you know that J. R. Okajima had been doing some testing and reported problems with this code; together with Al Viro he had narrowed down the problem to exactly this line of code. Fortunately, it will now be gone before 4.7-final is released.

Replacing the exclusive lock with a shared lock

Once per-name locking is in place, replacing the per-directory mutex with a per-directory read/write semaphore and only taking a read (or shared) lock for lookup is fairly straightforward. It has had some interesting consequences though.

As previously reported, Jan Kara expressed some concern at LSFMM about the performance of semaphores. They are not widely used in the kernel and read/write semaphores are inherently more complex than mutexes, so performance regressions seemed at least a theoretical possibility. At the time, Viro reported that he hadn't been able to measure any, but more recently Dave Hansen has found a small "blip" in unlink performance that he was able to narrow down to exactly the change from a mutex to a semaphore. Both mutexes and semaphores take an adaptive approach to waiting for the lock; first they spin for a little while, then they go to sleep and let the scheduler use the CPU for something else. They adapt slightly differently though, with mutexes spinning for longer in some cases. Consequently, using a mutex will waste more CPU time (reducing idle time) but often react more quickly (reducing latency).

Hansen wasn't really sure if this was an important regression or a puzzling inconsistency: "Should we be making rwsems spin more, or mutexes spin less?" he asked. Linus Torvalds felt that the mutex was probably the right approach since performance matters and: "Being slow under lock contention just tends to make for more lock contention".

Meanwhile Waiman Long has a patch set that makes a number of improvements to semaphores that may well address this issue too. So while the change was not as transparent as had been hoped, it appears that the performance of semaphores won't be a cause for concern for long.

In discussions following the original posting of this change, Viro observed that:

FWIW, I agree that relying on i_mutex^Wi_rwsem for dcache protection is something worth getting rid of in the longer term. But that protection is there right now, and getting rid of that will take quite a bit of careful massage.

So, if all goes well, the semaphore might eventually not be needed and any remaining measured regression will go along with it.

The change from exclusive to shared locking brought up another performance issue of a different kind. This issue affects directory reads ("readdir") rather than lookup; readdir was changed to use shared locking at the same time that lookup was changed, and for many of the same reasons. In particular, it affects dcache_readdir(), which is used by filesystems that keep all entries in a directory in the dcache. Specifically, it affects tmpfs.

dcache_readdir() acquires the d_lock spinlock for the directory, and similar locks on the entries in the directory. Previously, when readdir held an exclusive lock on the directory's mutex, these locks would mostly be uncontended and so impose minimal cost. With only a shared lock it is possible for parallel readdir operations to experience much more contention on these locks. Usually, finer grained locking improves performance, but when those locks result in more contention events, it can work the other way. As Viro described it when he reported the problem, there is now "an obscene amount of grabbing/releasing ->d_lock [...] And unlike mutex (or rswem exclusive), contention on ->d_lock chews a lot of cycles."

This difficulty seems well on the way to being resolved with a proposed patch that reduces the number of times that d_lock is claimed. It would not be fair to say that the shared-locking changes created this problem, but it does highlight that, when you make changes to locking rules, strange and unexpected results can certainly appear. This is why ongoing performance testing that looks for regressions, especially in unusual workloads, is so important; it is encouraging to see that happening.

There is clearly a lot of testing happening though, as Viro observed separately in the context of some NFS-related races, "we really need a consolidated regression testsuite". Full coverage for network filesystems is more challenging than local filesystems, in part because it really requires multiple machines. Ad-hoc testing by the community certainly does find bugs, as we have seen here, but it seems that though we have much more structured testing than we once did, we would still benefit from having more.

Comments (none posted)

How many -stable patches introduce new bugs?

By Jonathan Corbet
June 28, 2016

The -stable kernel release process faces a contradictory set of constraints. Developers naturally want to get as many fixes into -stable as possible but, at the same time, there is a strong desire to avoid introducing new regressions there. Each -stable release is, after all, intended to be more stable than its predecessor. At times there have been complaints that -stable is too accepting and too prone to regressions, but not many specifics. But, it turns out, this is an area where at least a little bit of objective research can be done.

Worries about -stable regressions

Back in April, Sasha Levin announced the creation of a new extra-stable tree that would only accept security fixes. That proposal was controversial, and, after an initial set of releases, Sasha would appear to have stepped away from this project. While he was defending it, though, he claimed that some -stable patches introduce their own bugs, and offered a suggestion for doubtful developers: "Take a look at how many commits in the stable tree have a 'Fixes:' tag that points to a commit that's also in the stable tree." That is exactly what your editor set out to do.

While kernel changelogs have a fairly well-defined structure, they are still free-form text, so investigating this area requires a certain amount of heuristics and regular-expression work, but it can be done. The first step is to look at the Fixes: tag as suggested by Sasha. Any kernel patch that fixes a bug introduced by another patch is meant to carry a tag like:

   Fixes: 76929ab51f0ee ("kselftests/ftrace: Add hist trigger testcases")

In the reality there is some variation in the format, the most common of which is putting the word "commit" before the ID. One would think that the -stable tree, which is supposed to contain (almost) exclusively fixes, would have a Fixes: tag on almost every commit. In truth, less than half of the commits there carry such tags. A few of those without tags are, in fact, straightforward reverts of buggy patches. Git adds a recognizable line to the changelog of reverts, so, unless the developer has significantly changed that line, it is easy to determine which patch is being "fixed" when a revert is done.

Either way, though, the ID for the patch that introduced the bug is almost invariably the ID used in the mainline tree — not the ID of the patch as it appears in the stable tree. Fortunately, stable-tree patches are required to carry a line like:

   commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream.

The format of that line tends to vary too, but, once that is coped with, it turns out that something around 99% of the changesets in the stable tree can be mapped to their mainline equivalent. Or, more to the point, the mapping can be done in the other direction, allowing Fixes: tags to be associated with commits in a specific -stable series. So, a when Fixes: line exists, one can, as a rule, fairly easily determine whether the patch fixes a bug introduced by another -stable patch.

The results

The most recent long-term support kernel is 4.4, which has had 14 stable updates thus far. Those updates contained 1,712 changesets, 632 of which contained some form of Fixes: tag. Of those, it turns out that 39 were fixes for other patches that had already appeared in 4.4-stable. So just over 2% of the patches going into 4.4-stable have proved (so far) to contain bugs of their own requiring further fixes.

For the curious, here's the full set:

4.4-stable patches with bugs

Introduced Fixed

v4.4.1 43a2ba8c1a003c82 v4.4.1 0dec73176d5592ca

v4.4.1 b5398ab9d4540c95 v4.4.2 29a928ff8c1055ab

v4.4.1 f5b62074b31a2844 v4.4.3 434e26d6f6a000b8

v4.4.1 5e226f9689d90ad8 v4.4.4 3ba9b9f2409168fb

v4.4.1 e924c60db1b4891e v4.4.10 a9bd748299179a8d

v4.4.1 f50c2907a9b3dfc1 v4.4.1 9497f702ab82314d

v4.4.2 d2081cfe624b5dec v4.4.9 9fed24fe30c1217c

v4.4.2 144b7ecc3bd6fdf7 v4.4.3 a40efb855068a20c

v4.4.2 c9b1074e18b607f5 v4.4.8 4b59a38da5983852

v4.4.2 f2e274ce8bfe8ab9 v4.4.2 6bb06a4fa1894533

v4.4.2 1489f5d951089deb v4.4.10 a7fa0a478a625039

v4.4.3 bbfe21c87bd0f529 v4.4.12 fa5613b1f39ec020

v4.4.3 152fb02241b60ffb v4.4.6 f3c83858c6aee893

v4.4.3 726ecfc321994ec6 v4.4.10 f6ff7398220d7fda

v4.4.3 f4595e0081495b67 v4.4.3 55e0d9869f1d3a6b

v4.4.4 3824f7874a752196 v4.4.6 78939530542f409e

v4.4.4 b36e52c44ce67288 v4.4.6 6f0679556b563bcd

v4.4.4 a83b349814dee660 v4.4.6 f8456804460f5c23

v4.4.4 7ed338d4a9f58d88 v4.4.4 556dfd8dae7d66b3

v4.4.4 7c465723d0b6f262 v4.4.5 c5cbbec54fe71c4d

v4.4.4 fc90441e728aa461 v4.4.5 25e8618619a5a46a

v4.4.4 996c591227d988ed v4.4.7 dc1441612fdb4ca2

v4.4.5 7adb5cc0f39be29c v4.4.6 b59ea3efba4889ec

v4.4.5 e75c4b65150f0997 v4.4.6 97142f3009557c27

v4.4.7 4c8fe4f52755d469 v4.4.9 5a58f809d731c23c

v4.4.7 b1999fa6e8145305 v4.4.9 34af67eb941ae537

v4.4.7 c045105c641ccbeb v4.4.7 19e0783ae96837e3

v4.4.7 dff87fa52ddf26df v4.4.9 5582eb00f5b23622

v4.4.7 8cbac3c4f74d92bf v4.4.13 a87f69dceff5c93a

v4.4.7 7f47aea487df2dc2 v4.4.9 9d58f322ee18ffac

v4.4.7 a918d2bcea6aab6e v4.4.7 6677a2ab036f2813

v4.4.7 791b5b0d2d01542a v4.4.9 67fb098f6f23ebab

v4.4.7 5b5abb9b85e97630 v4.4.9 54aeb5854ec03315

v4.4.9 9d58f322ee18ffac v4.4.9 be5cbaf31cd318f8

v4.4.9 19a4e46b4513bab7 v4.4.11 9df2dc6cf4adb711

v4.4.11 1575c095e444c927 v4.4.14 f5f16bf66d7e07e5

v4.4.12 098942bcf4b1d057 v4.4.14 5e8b53a4db947494

v4.4.14 c9bc125c922e8550 v4.4.14 e9c74337a7c03d33

v4.4.14 2066499780e1455c v4.4.14 fe1e4026ce9f0365

4.4-stable patches with bugs
Introduced	Fixed
v4.4.1	`43a2ba8c1a003c82`	v4.4.1	`0dec73176d5592ca`
v4.4.1	`b5398ab9d4540c95`	v4.4.2	`29a928ff8c1055ab`
v4.4.1	`f5b62074b31a2844`	v4.4.3	`434e26d6f6a000b8`
v4.4.1	`5e226f9689d90ad8`	v4.4.4	`3ba9b9f2409168fb`
v4.4.1	`e924c60db1b4891e`	v4.4.10	`a9bd748299179a8d`
v4.4.1	`f50c2907a9b3dfc1`	v4.4.1	`9497f702ab82314d`
v4.4.2	`d2081cfe624b5dec`	v4.4.9	`9fed24fe30c1217c`
v4.4.2	`144b7ecc3bd6fdf7`	v4.4.3	`a40efb855068a20c`
v4.4.2	`c9b1074e18b607f5`	v4.4.8	`4b59a38da5983852`
v4.4.2	`f2e274ce8bfe8ab9`	v4.4.2	`6bb06a4fa1894533`
v4.4.2	`1489f5d951089deb`	v4.4.10	`a7fa0a478a625039`
v4.4.3	`bbfe21c87bd0f529`	v4.4.12	`fa5613b1f39ec020`
v4.4.3	`152fb02241b60ffb`	v4.4.6	`f3c83858c6aee893`
v4.4.3	`726ecfc321994ec6`	v4.4.10	`f6ff7398220d7fda`
v4.4.3	`f4595e0081495b67`	v4.4.3	`55e0d9869f1d3a6b`
v4.4.4	`3824f7874a752196`	v4.4.6	`78939530542f409e`
v4.4.4	`b36e52c44ce67288`	v4.4.6	`6f0679556b563bcd`
v4.4.4	`a83b349814dee660`	v4.4.6	`f8456804460f5c23`
v4.4.4	`7ed338d4a9f58d88`	v4.4.4	`556dfd8dae7d66b3`
v4.4.4	`7c465723d0b6f262`	v4.4.5	`c5cbbec54fe71c4d`
v4.4.4	`fc90441e728aa461`	v4.4.5	`25e8618619a5a46a`
v4.4.4	`996c591227d988ed`	v4.4.7	`dc1441612fdb4ca2`
v4.4.5	`7adb5cc0f39be29c`	v4.4.6	`b59ea3efba4889ec`
v4.4.5	`e75c4b65150f0997`	v4.4.6	`97142f3009557c27`
v4.4.7	`4c8fe4f52755d469`	v4.4.9	`5a58f809d731c23c`
v4.4.7	`b1999fa6e8145305`	v4.4.9	`34af67eb941ae537`
v4.4.7	`c045105c641ccbeb`	v4.4.7	`19e0783ae96837e3`
v4.4.7	`dff87fa52ddf26df`	v4.4.9	`5582eb00f5b23622`
v4.4.7	`8cbac3c4f74d92bf`	v4.4.13	`a87f69dceff5c93a`
v4.4.7	`7f47aea487df2dc2`	v4.4.9	`9d58f322ee18ffac`
v4.4.7	`a918d2bcea6aab6e`	v4.4.7	`6677a2ab036f2813`
v4.4.7	`791b5b0d2d01542a`	v4.4.9	`67fb098f6f23ebab`
v4.4.7	`5b5abb9b85e97630`	v4.4.9	`54aeb5854ec03315`
v4.4.9	`9d58f322ee18ffac`	v4.4.9	`be5cbaf31cd318f8`
v4.4.9	`19a4e46b4513bab7`	v4.4.11	`9df2dc6cf4adb711`
v4.4.11	`1575c095e444c927`	v4.4.14	`f5f16bf66d7e07e5`
v4.4.12	`098942bcf4b1d057`	v4.4.14	`5e8b53a4db947494`
v4.4.14	`c9bc125c922e8550`	v4.4.14	`e9c74337a7c03d33`
v4.4.14	`2066499780e1455c`	v4.4.14	`fe1e4026ce9f0365`

There are a couple of things worth noting in these results. One is that nine of the bugs introduced into 4.4-stable were fixed in the same -stable release — and some were arguably not bugs at all. So those problems almost certainly did not actually affect any -stable users; taking those out reduces the number of actual -stable regressions in 4.4 (so far) to 30. On the other hand, 2/3 of the changes in 4.4-stable carry no Fixes: tag, but the bulk of them should still certainly be bug fixes. Some of them, undoubtedly, fix regressions that appeared in -stable, but, in the absence of somebody with the time, patience, and alcohol required to manually examine nearly 1,100 patches, there is no way to say for sure how many do.

To get some sort of vague sense of the regression rate, one can start with the fact that the number found here constitutes a hard floor — the rate must be at least that high. If one makes the assumption that the regression rates in patches without Fixes: tags is no higher than those with the tags, a simple ratio gives the ceiling for the overall rate. For 4.4, that places the regression rate somewhere in the range 2.3-6.2%. Results from some of the other -stable trees are:

Series Patches Fixes: # fixed %regressions

4.6 314 144 2 0.6-1.4% Details

4.5 973 437 9 0.9-2.1% Details

4.4 1,712 632 39 2.3-6.2% Details

3.14 4,779 1,098 105 2.2-9.6% Details

Series	Patches	`Fixes:`	# fixed	%regressions
4.6	314	144	2	0.6-1.4%	Details
4.5	973	437	9	0.9-2.1%	Details
4.4	1,712	632	39	2.3-6.2%	Details
3.14	4,779	1,098	105	2.2-9.6%	Details

In the end, the results are clearly noisy. There are regressions that appear in the -stable tree, and one can make some estimates as to just how many they are. There is no target regression rate for -stable (assuming that a target of zero is unrealistic), so whether the numbers shown above are acceptable or not is probably a matter of perspective — and whether one has been personally burned by a -stable regression or not.

One conclusion that can tentatively be drawn is that the regression rates for more recent kernels seem to be lower. Some portion of that reduction certainly comes from the youth of those kernels — there just hasn't been time to find all of the bugs yet. But it may also be that the efforts that have been made to reduce regressions in -stable (in particular, holding -stable patches until after they have appeared in a mainline -rc release) are having some effect.

In the end, nobody wants to see regressions in the -stable trees. But tightening down on patch acceptance to the point that regressions no longer appear there will almost certainly result in buggier kernels overall, since many good fixes will not be accepted. As with many things in engineering, picking stable patches involves tradeoffs; hopefully the addition of some metrics can help the community to decide whether those tradeoffs are being made correctly.

The code used to generate these results can be found as part of the gitdm collection of cheesy data-mining tools, located at git://git.lwn.net/gitdm.git.

Comments (20 posted)

Linus Torvalds Linux 4.7-rc5 ?

Greg KH Linux 4.6.3 ?

Greg KH Linux 4.4.14 ?

Sasha Levin Linux 4.1.27 ?

Steven Rostedt 4.1.27-rt30 ?

Sasha Levin Linux 3.18.36 ?

Steven Rostedt 3.18.36-rt37 ?

Greg KH Linux 3.14.73 ?

Steven Rostedt 3.14.72-rt75 ?

Steven Rostedt 3.12.61-rt81 ?

Steven Rostedt 3.10.102-rt112 ?

Steven Rostedt 3.4.112-rt142 ?

Steven Rostedt 3.2.81-rt116 ?

Geoff Levand arm64 kexec kernel patches ?

David Long arm64: Add kernel probes (kprobes) support ?

Joseph Lo arm64: tegra: add BPMP support ?

Andre Przywara KVM: arm64: GICv3 ITS emulation ?

Bartlomiej Zolnierkiewicz ARM: remove big.LITTLE switcher support ?

Alexandre Belloni Embedding Position Independent Executables ?

Mehmet Kayaalp Certificate insertion support for x86 bzImages ?

Megha Dey crypto: SHA256 multibuffer implementation ?

Andy Lutomirski Virtually mapped stacks with guard pages (x86, core) ?

Andy Lutomirski virtually mapped stacks and thread_info cleanup ?

Matt Fleming efi: Permanent runtime EFI memmap support ?

Dmitry Safonov x86: 32-bit compatible C/R on x86_64 ?

Emese Revfy Introduce the initify gcc plugin ?

Chris Wilson Introduce fences for N:M completion variables ?

Thomas Gleixner timer: Refactor the timer wheel ?

Lina Iyer PM: SoC idle support using PM domains ?

Pan Xinhui implement vcpu preempted check ?

Borislav Petkov printk.kmsg: Ratelimit it by default ?

John Kacur rt-tests: sched_deadline tests ?

Zhao Qiang Maxim/driver: Add driver for maxim ds26522 ?

Chris Zhong Rockchip Type-C and DispplayPort driver ?

Bin Gao [PATCH v3 0/3] acpi/pmic: add opregion driver for Intel BXT WhiskeyCove PMIC ?

Neil Armstrong clk: Add MDM9615 Clock Controllers driver and bindings ?

Neil Armstrong Add support for the Qualcomm PM8018 PMIC ?

Sylwester Nawrocki Sound support for Exynos5433 TM2(E) boards ?

Anthony Felice Enable display and touchscreen for Vybrid Tower ?

Florian Vaussard leds: Add driver for NCP5623 3-channel I2C LED driver ?

Nicolas Boichat drm: bridge: anx7688 and mux drivers ?

Raveendra Padasalagi Add Broadcom iproc-static-adc controller driver ?

Quentin Schulz add support for Allwinner SoCs ADC ?

Tai Nguyen perf: Add APM X-Gene SoC Performance Monitoring Unit driver ?

Saeed Mahameed Mellanox 100G SRIOV E-Switch offload and VF representors ?

Niklas Söderlund [media] i2c: adv7482: add adv7482 driver ?

Niklas Söderlund [media] rcar-csi2: add Renesas R-Car MIPI CSI-2 driver ?

Amir Levy thunderbolt: Introducing Thunderbolt(TM) networking ?

Keerthy mfd: lp873x: Add lp873x PMIC support ?

Jon Mason net: ethernet: bgmac: Add platform device support ?

Jon Hunter Add support for Tegra DPAUX pinctrl ?

Baolin Wang Introduce usb charger framework to deal with the usb gadget power negotation ?

Franklin S Cooper Jr [PATCH v2 0/3] scatterlist: Add support to clone sg_table ?

Andi Kleen perf, tools: Add documentation for perf.data on disk format ?

Hans Verkuil HDMI CEC framework ?

Toshi Kani Support DAX for device-mapper dm-linear devices ?

Bob Peterson Per superblock inode reclaim ?

Nikolay Borisov Inotify limits per usernamespace ?

Vlastimil Babka make direct compaction more deterministic ?

David Howells KEYS: keyctl operations for asymmetric keys [ver #2] ?

Topi Miettinen capabilities: add capability cgroup controller ?

Casey Schaufler LSM: module hierarchy in /proc/.../attr ?

Tadeusz Struk crypto: algif - add akcipher ?

Jann Horn namespaces: add transparent user namespaces ?

Jann Horn reference count hardening via kref: another attempt ?

Dan Jurgens SELinux support for Infiniband RDMA ?

Liang Li Extend virtio-balloon for fast (de)inflating & fast live migration ?

He Kuang perf tools: Support uBPF script ?

Kernel development

Brief items

Kernel release status

Quote of the week

Summary of the Glungezer realtime summit

WireGuard: a new VPN tunnel

Kernel development news

Virtually mapped stacks 2: thread_info strikes back

A useless cache

Getting rid of thread_info

Parallel pathname lookups and the importance of testing

A "don't know" state for dcache entries

Replacing the exclusive lock with a shared lock

How many -stable patches introduce new bugs?

Worries about -stable regressions

The results

Patches and updates

Kernel trees

Architecture-specific

Build system

Core kernel code

Development tools

Device drivers

Device driver infrastructure

Documentation

Filesystems and block I/O

Memory management

Security-related

Virtualization and containers

Miscellaneous