|
|
Log in / Subscribe / Register

How many -stable patches introduce new bugs?

By Jonathan Corbet
June 28, 2016
The -stable kernel release process faces a contradictory set of constraints. Developers naturally want to get as many fixes into -stable as possible but, at the same time, there is a strong desire to avoid introducing new regressions there. Each -stable release is, after all, intended to be more stable than its predecessor. At times there have been complaints that -stable is too accepting and too prone to regressions, but not many specifics. But, it turns out, this is an area where at least a little bit of objective research can be done.

Worries about -stable regressions

Back in April, Sasha Levin announced the creation of a new extra-stable tree that would only accept security fixes. That proposal was controversial, and, after an initial set of releases, Sasha would appear to have stepped away from this project. While he was defending it, though, he claimed that some -stable patches introduce their own bugs, and offered a suggestion for doubtful developers: "Take a look at how many commits in the stable tree have a 'Fixes:' tag that points to a commit that's also in the stable tree." That is exactly what your editor set out to do.

While kernel changelogs have a fairly well-defined structure, they are still free-form text, so investigating this area requires a certain amount of heuristics and regular-expression work, but it can be done. The first step is to look at the Fixes: tag as suggested by Sasha. Any kernel patch that fixes a bug introduced by another patch is meant to carry a tag like:

   Fixes: 76929ab51f0ee ("kselftests/ftrace: Add hist trigger testcases")

In the reality there is some variation in the format, the most common of which is putting the word "commit" before the ID. One would think that the -stable tree, which is supposed to contain (almost) exclusively fixes, would have a Fixes: tag on almost every commit. In truth, less than half of the commits there carry such tags. A few of those without tags are, in fact, straightforward reverts of buggy patches. Git adds a recognizable line to the changelog of reverts, so, unless the developer has significantly changed that line, it is easy to determine which patch is being "fixed" when a revert is done.

Either way, though, the ID for the patch that introduced the bug is almost invariably the ID used in the mainline tree — not the ID of the patch as it appears in the stable tree. Fortunately, stable-tree patches are required to carry a line like:

   commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream.

The format of that line tends to vary too, but, once that is coped with, it turns out that something around 99% of the changesets in the stable tree can be mapped to their mainline equivalent. Or, more to the point, the mapping can be done in the other direction, allowing Fixes: tags to be associated with commits in a specific -stable series. So, a when Fixes: line exists, one can, as a rule, fairly easily determine whether the patch fixes a bug introduced by another -stable patch.

The results

The most recent long-term support kernel is 4.4, which has had 14 stable updates thus far. Those updates contained 1,712 changesets, 632 of which contained some form of Fixes: tag. Of those, it turns out that 39 were fixes for other patches that had already appeared in 4.4-stable. So just over 2% of the patches going into 4.4-stable have proved (so far) to contain bugs of their own requiring further fixes.

For the curious, here's the full set:

4.4-stable patches with bugs
IntroducedFixed
v4.4.1 43a2ba8c1a003c82 v4.4.1 0dec73176d5592ca
v4.4.1 b5398ab9d4540c95 v4.4.2 29a928ff8c1055ab
v4.4.1 f5b62074b31a2844 v4.4.3 434e26d6f6a000b8
v4.4.1 5e226f9689d90ad8 v4.4.4 3ba9b9f2409168fb
v4.4.1 e924c60db1b4891e v4.4.10 a9bd748299179a8d
v4.4.1 f50c2907a9b3dfc1 v4.4.1 9497f702ab82314d
v4.4.2 d2081cfe624b5dec v4.4.9 9fed24fe30c1217c
v4.4.2 144b7ecc3bd6fdf7 v4.4.3 a40efb855068a20c
v4.4.2 c9b1074e18b607f5 v4.4.8 4b59a38da5983852
v4.4.2 f2e274ce8bfe8ab9 v4.4.2 6bb06a4fa1894533
v4.4.2 1489f5d951089deb v4.4.10 a7fa0a478a625039
v4.4.3 bbfe21c87bd0f529 v4.4.12 fa5613b1f39ec020
v4.4.3 152fb02241b60ffb v4.4.6 f3c83858c6aee893
v4.4.3 726ecfc321994ec6 v4.4.10 f6ff7398220d7fda
v4.4.3 f4595e0081495b67 v4.4.3 55e0d9869f1d3a6b
v4.4.4 3824f7874a752196 v4.4.6 78939530542f409e
v4.4.4 b36e52c44ce67288 v4.4.6 6f0679556b563bcd
v4.4.4 a83b349814dee660 v4.4.6 f8456804460f5c23
v4.4.4 7ed338d4a9f58d88 v4.4.4 556dfd8dae7d66b3
v4.4.4 7c465723d0b6f262 v4.4.5 c5cbbec54fe71c4d
v4.4.4 fc90441e728aa461 v4.4.5 25e8618619a5a46a
v4.4.4 996c591227d988ed v4.4.7 dc1441612fdb4ca2
v4.4.5 7adb5cc0f39be29c v4.4.6 b59ea3efba4889ec
v4.4.5 e75c4b65150f0997 v4.4.6 97142f3009557c27
v4.4.7 4c8fe4f52755d469 v4.4.9 5a58f809d731c23c
v4.4.7 b1999fa6e8145305 v4.4.9 34af67eb941ae537
v4.4.7 c045105c641ccbeb v4.4.7 19e0783ae96837e3
v4.4.7 dff87fa52ddf26df v4.4.9 5582eb00f5b23622
v4.4.7 8cbac3c4f74d92bf v4.4.13 a87f69dceff5c93a
v4.4.7 7f47aea487df2dc2 v4.4.9 9d58f322ee18ffac
v4.4.7 a918d2bcea6aab6e v4.4.7 6677a2ab036f2813
v4.4.7 791b5b0d2d01542a v4.4.9 67fb098f6f23ebab
v4.4.7 5b5abb9b85e97630 v4.4.9 54aeb5854ec03315
v4.4.9 9d58f322ee18ffac v4.4.9 be5cbaf31cd318f8
v4.4.9 19a4e46b4513bab7 v4.4.11 9df2dc6cf4adb711
v4.4.11 1575c095e444c927 v4.4.14 f5f16bf66d7e07e5
v4.4.12 098942bcf4b1d057 v4.4.14 5e8b53a4db947494
v4.4.14 c9bc125c922e8550 v4.4.14 e9c74337a7c03d33
v4.4.14 2066499780e1455c v4.4.14 fe1e4026ce9f0365

There are a couple of things worth noting in these results. One is that nine of the bugs introduced into 4.4-stable were fixed in the same -stable release — and some were arguably not bugs at all. So those problems almost certainly did not actually affect any -stable users; taking those out reduces the number of actual -stable regressions in 4.4 (so far) to 30. On the other hand, 2/3 of the changes in 4.4-stable carry no Fixes: tag, but the bulk of them should still certainly be bug fixes. Some of them, undoubtedly, fix regressions that appeared in -stable, but, in the absence of somebody with the time, patience, and alcohol required to manually examine nearly 1,100 patches, there is no way to say for sure how many do.

To get some sort of vague sense of the regression rate, one can start with the fact that the number found here constitutes a hard floor — the rate must be at least that high. If one makes the assumption that the regression rates in patches without Fixes: tags is no higher than those with the tags, a simple ratio gives the ceiling for the overall rate. For 4.4, that places the regression rate somewhere in the range 2.3-6.2%. Results from some of the other -stable trees are:

Series Patches Fixes:  # fixed %regressions
4.6314144 20.6-1.4% Details
4.5973437 90.9-2.1% Details
4.41,712632 392.3-6.2% Details
3.144,7791,098 1052.2-9.6% Details

In the end, the results are clearly noisy. There are regressions that appear in the -stable tree, and one can make some estimates as to just how many they are. There is no target regression rate for -stable (assuming that a target of zero is unrealistic), so whether the numbers shown above are acceptable or not is probably a matter of perspective — and whether one has been personally burned by a -stable regression or not.

One conclusion that can tentatively be drawn is that the regression rates for more recent kernels seem to be lower. Some portion of that reduction certainly comes from the youth of those kernels — there just hasn't been time to find all of the bugs yet. But it may also be that the efforts that have been made to reduce regressions in -stable (in particular, holding -stable patches until after they have appeared in a mainline -rc release) are having some effect.

In the end, nobody wants to see regressions in the -stable trees. But tightening down on patch acceptance to the point that regressions no longer appear there will almost certainly result in buggier kernels overall, since many good fixes will not be accepted. As with many things in engineering, picking stable patches involves tradeoffs; hopefully the addition of some metrics can help the community to decide whether those tradeoffs are being made correctly.

The code used to generate these results can be found as part of the gitdm collection of cheesy data-mining tools, located at git://git.lwn.net/gitdm.git.

Index entries for this article
KernelDevelopment model/Stable tree


to post comments

How many -stable patches introduce new bugs?

Posted Jun 29, 2016 4:01 UTC (Wed) by sashal (✭ supporter ✭, #81842) [Link] (2 responses)

I haven't stepped away from the stable-security project.

The trees are pretty much up to date with the current stable trees.

How many -stable patches introduce new bugs?

Posted Jun 29, 2016 4:11 UTC (Wed) by sashal (✭ supporter ✭, #81842) [Link] (1 responses)

It's also worth noting that a fix for a stable commit is much more risky that a regular upstream commit, as Dave Chinner describes here: https://www.spinics.net/lists/stable/msg134831.html

"""
I get rather concerned about the stable kernel process when this
starts happening - no-one I know of on the XFS side does any testing
on *any* of the stable kernel releases, and now we're in the
dangerous territory of layered on-disk format fixes being backported
by developers without any specific XFS expertise and there's no
real safety net....
"""

How many -stable patches introduce new bugs?

Posted Jun 30, 2016 15:13 UTC (Thu) by jani (subscriber, #74547) [Link]

I share Dave's sentiment, though in graphics I don't have to freak out quite as much as I would in filesystems...

Adding that cc: stable tag always feels like a risk. Even if you've tested the commit on upstream doesn't mean that it's the right fix for all the stable kernels out there. There are now 11 stable and longterm kernels, and testing them all before applying that tag is overwhelming. But at the same time we get feedback from users to backport more because they don't necessarily want to run their own kernels to get the fix, preferring to run distro kernels instead. And adding that tag is so convenient for the maintainer, fire and forget. Much easier than following up on fixes later on, and making backport requests when they've seen more testing in the real world.

Then there's the problem what to do when a cc: stable turns out not to have been such a great idea for a commit. Fire and forget, can't get it back. We don't have a way to "revert" the cc: stable tags. It's history carved in stone. We get the mails from the stable maintainers about backports, but they may slip through anyway. And what might be a good idea for one stable version is not necessarily a good idea for another.

I've entertained this idea about Someone(tm) maintaining a database (maybe just some fancy scripting around git-notes to make it easy to access and distribute) of information about fixes to backport, which fixes have been backported and where, what should not be backported, commits that are referenced by future commits, etc. I spend quite a bit of time doing git archeology like this, and this feels like a thing that could be collaborated on, crowdsourced if you will. It would be great to be able to retroactively add a huge red flag to a commit saying it wasn't so stable after all.

How many -stable patches introduce new bugs?

Posted Jun 29, 2016 15:02 UTC (Wed) by Otus (subscriber, #67685) [Link] (1 responses)

> One conclusion that can tentatively be drawn is that the regression rates for more recent kernels seem to be lower. Some portion of that reduction certainly comes from the youth of those kernels — there just hasn't been time to find all of the bugs yet.

A more apples-to-apples comparison would be 4.4.14 vs. 3.14.14, ignoring the later updates. Likewise 4.5.7 vs. 4.4.7 vs. 3.14.7.

How many -stable patches introduce new bugs?

Posted Jun 29, 2016 15:45 UTC (Wed) by Otus (subscriber, #67685) [Link]

Actually, that was easy enough to check, though I couldn't compute upper bounds. Looks like its just noise rather than improvement.

4.5.7: 9 out of 973 changesets gives a 0.92% floor
4.4.7: 20 out of 1004 changesets gives a 1.99% floor
3.14.7: 5 out of 756 changesets gives a 0.66% floor

4.4.14: 39 out of 1712 changesets gives a 2.28% floor
3.14.14: 13 out of 1260 changesets gives a 1.0% floor

How many -stable patches introduce new bugs?

Posted Jun 29, 2016 15:05 UTC (Wed) by fredex (subscriber, #11727) [Link]

Based on my experience developing code for many years it is rather unlikely that one can change something in a large/complex body of code without also breaking something.

Granted, I don't consider myself to be in the same league as these gurus who work on the kernel, but nevertheless the percentage numbers you derive here I find to be surprisingly low!

All I can say is "Hats off to the kernel gurus!"

How many -stable patches introduce new bugs?

Posted Jun 29, 2016 15:43 UTC (Wed) by ballombe (subscriber, #9523) [Link] (1 responses)

> On the other hand, 2/3 of the changes in 4.4-stable carry no Fixes: tag, but the bulk of them should still certainly be bug fixes.

Is it not the root of Sasha issue ? 2/3 of changes without Fixed tags is a lot.

How many -stable patches introduce new bugs?

Posted Jun 30, 2016 14:34 UTC (Thu) by bmork (subscriber, #88411) [Link]

>> On the other hand, 2/3 of the changes in 4.4-stable carry no Fixes: tag, but the bulk of them should still certainly be bug fixes.
>
> Is it not the root of Sasha issue ? 2/3 of changes without Fixed tags is a lot.

I wonder how many of those are new device IDs for drivers? Such patches will not have a Fixes tag and will probably have a lower regression rate than other stable patches.

How many -stable patches introduce new bugs?

Posted Jun 29, 2016 16:02 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Some data on average bug-introduction rates of fixes from Capers Jones:

Something like 7 percent of your attempts to fix a bug add a new bug that wasn't there before.

Others say that:

this can run as high as 20% for complex, poorly-structured code.

Therefore, if we believe Andrew S. Tanenbaum, the recent -stable trees are doing very well indeed! Still, no reason not to try to do even better.

How many -stable patches introduce new bugs?

Posted Jul 1, 2016 6:22 UTC (Fri) by marcH (subscriber, #57642) [Link] (9 responses)

> As with many things in engineering, picking stable patches involves tradeoffs; hopefully the addition of some metrics can help the community to decide whether those tradeoffs are being made correctly

In professional software development these tradeoffs and decisions are part of something called "validation", "QA", etc. Any idea how much of this is happening on -stable kernels?

How many -stable patches introduce new bugs?

Posted Jul 1, 2016 7:31 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

"It compiles - ship it!"

And sometimes not even that.

How many -stable patches introduce new bugs?

Posted Jul 1, 2016 7:46 UTC (Fri) by micka (subscriber, #38720) [Link] (7 responses)

> In professional software development

That's a strange distinction I think. If it's their day work or part of their day work, or even not recurrent work (not daily), and it's software development, then it's "professional software development".

What do you mean by that ?

How many -stable patches introduce new bugs?

Posted Jul 1, 2016 15:18 UTC (Fri) by marcH (subscriber, #57642) [Link] (6 responses)

By "day work" I assume you meant "paid day work". You're right it's the original meaning but there's another, casual one.

If your roof still leaks right after you paid a professional to fix it, it's fairly common to dismiss that person as an "amateur". There's a certain level of expectations coming from people regularly earning money from an activity. It works the other round too: hobbyists can be complimented as being (as good as) "a pro" even when they're technically not.

Open-source projects are very unusual in a number ways, one of them: their master branch is often just a shared base and rarely included "as is" in paid products. Most of their (professional) validation happens on _derived_ forms - see Android for extreme examples.

How many -stable patches introduce new bugs?

Posted Jul 1, 2016 18:50 UTC (Fri) by micka (subscriber, #38720) [Link] (5 responses)

> If your roof still leaks right after you paid a professional to fix it, it's fairly common to dismiss that person as an "amateur".

Nope. That's a bad professional. Some are good, some are bad.

How many -stable patches introduce new bugs?

Posted Jul 1, 2016 18:58 UTC (Fri) by rriggs (guest, #11598) [Link] (3 responses)

So what you are saying is that we need an "Angie's List" of stable kernel patch submitters where we get to rate their performance?

How many -stable patches introduce new bugs?

Posted Jul 1, 2016 19:11 UTC (Fri) by micka (subscriber, #38720) [Link] (2 responses)

No idea what an "Angie's List" is. Nor what it has to do with all this.
I was just pointing the fact that professional/not professional was probably not the distinction the original comment was trying to achieve.

How many -stable patches introduce new bugs?

Posted Jul 2, 2016 9:39 UTC (Sat) by hummassa (guest, #307) [Link] (1 responses)

Angie's List is a US-based, paid subscription supported website containing crowd-sourced reviews of local businesses.

How many -stable patches introduce new bugs?

Posted Jul 2, 2016 22:06 UTC (Sat) by giraffedata (guest, #1954) [Link]

Angie's List is a US-based, paid subscription supported website containing crowd-sourced reviews of local businesses.

Service businesses, I think. Like roofers.

How many -stable patches introduce new bugs?

Posted Jul 12, 2016 17:41 UTC (Tue) by Wol (subscriber, #4433) [Link]

> Nope. That's a bad professional. Some are good, some are bad.

And the best professionals are all true amateurs ... paid to do the job they love ...

Cheers,
Wol

How many -stable patches introduce new bugs?

Posted Aug 4, 2016 18:37 UTC (Thu) by wtarreau (subscriber, #51152) [Link]

Well, regressions are painful for everyone and we should strive to limit them to the lowest possible number. But it is also important to understand that a regression (re-)introduces a bug. A missing patch keeps a bug as well.

Overall in the end, what really matters is that reliability that end users observe. The reliability depends on two factors :
- the risk of a fault
- the time it takes to fix a fault

The first one directly depends on the number of unfixed bugs and their respective probabilities of occurence, so it decreases with the number of fixes and increases with the number of regressions.

The second one increases with the end user's trust in the product. When you deploy 4.4.0 you know the paint is still fresh and you monitor it a lot, ready to revert to the previous version. When you're running on 4.4.16 and the last five 4.4.x used to work fine, you start to watch it a little bit less. When you run on 3.10.100 from a tree that never betrayed you in 2 years, you routinely update to 3.10.101 or 102 without even thinking about it.

So the problem is not about regressions by themselves, but about late regressions, the time to detect them and to fix them. That's a difficulty we have with LTS kernels especially because as time flies, the distance with mainline increases and certain fixes get harder to backport, and sometimes even to qualify for inclusion. That's why I'm used to say that after some time, the quality of a kernel starts to degrade again (eg: 2.4 reached a state where it was almost unfixable regarding certain issues, the risk of breaking lots of core stuff was too high for certain minor local DoS issues, so they were documented instead).

It is worth noting that most of the 4.4 regressions above were fixed no later than 2 versions after having being introduced. For the most sensitive people, given that Greg's kernels that are released on average once a week, it can make sense to stay late by 1-2 weeks and to skip known bad versions when you see that a bug affects you. That can reduce the risk of seeing such a regression.

But quite honnestly, the probability of ever hitting a bug is very low considering the amount of subsystems present. I only recall a few times having upgraded to fix a bug that was hitting me. Most of the upgrades are just done to stay up to date, to limit the risk of bugs and to stay secure.

And please don't forget that being exposed to thousands of unfixed bugs is always worse for your stability and integrity than risking a few regressions in the whole system's lifetime. Our products have been shipping with 2.6.32, 3.10, 3.14 and will start with 4.4 in a few weeks. The only regressions we faced were during the large upgrades (eg: 2.6.32 to 3.10), and we faced only one during the whole 3.10 lifetime which was in fact caused by one of our local patches being applied at the wrong place after its context was changed by a fix for a real bug :-)


Copyright © 2016, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds