Revisiting stable-kernel regressions
obviously correct and tested". Even so, for nearly as long as the kernel community has been producing stable update releases, said community has also been complaining about regressions that make their way into those releases. Back in 2016, LWN did some analysis that showed the presence of regressions in stable releases, though at a rate that many saw as being low enough. Since then, the volume of patches showing up in stable releases has grown considerably, so perhaps the time has come to see what the situation with regressions is with current stable kernels.
As an example of the number of patches going into the stable kernel updates, consider that, as of 4.9.213, 15,648 patches have been added to the original 4.9 release — that is an entire development cycle worth of patches added to a "stable" kernel. Reviewing all of those to see whether each contains a regression is not practical, even for the maintainers of the stable updates. But there is an automated way to get a sense for how many of those stable-update patches bring regressions with them.
The convention in the kernel community is to add a Fixes tag to any patch fixing a bug introduced by another patch; that tag includes the commit ID for the original, buggy patch. Since stable kernel releases are supposed to be limited to fixes, one would expect that almost every patch would carry such a tag. In the real world, about 40-60% of the commits to a stable series carry Fixes tags; the proportion appears to be increasing over time as the discipline of adding those tags improves.
It is a relatively straightforward task (for a computer) to look at the Fixes tag(s) in any patch containing them, extract the commit IDs of the buggy patches, and see if those patches, too, were added in a stable update. If so, it is possible to conclude that the original patch was buggy and caused a regression in need of fixing. There are, naturally, some complications, including the fact that stable-kernel commits have different IDs than those used in the mainline (where all fixes are supposed to appear first); associating fixes with commits requires creating a mapping between the two. Outright reverts of buggy patches tend not to have Fixes tags, so they must be caught separately. And so on. The end result will necessarily contain some noise, but there is a useful signal there as well.
For the curious, this analysis was done with the stablefixes tool, part of the gitdm collection of repository data-mining hacks. It can be cloned from git://git.lwn.net/gitdm.git.
Back in 2016, your editor came up with a regression rate of at least 2% for the longer-term stable kernels that were maintained at that time. The 4.4 series, which had 1,712 commits then, showed a regression rate of at least 2.3%. Since then, the number of commits has grown considerably — to 14,211 in 4.4.213 — as a result of better discipline and the use of automated tools (including a machine-learning system) to select fixes that were not explicitly earmarked for stable backporting. Your editor fixed up his script, ported it to Python 3, and reran the analysis for the currently supported stable kernels; the results look like this.
Series Commits Tags Fixes Reverts 5.4.18 2,423 1,482 61% 74 29 Details 4.19.102 11,758 5,647 48% 588 100 Details 4.14.170 15,527 6,727 43% 985 134 Details 4.9.213 15,647 6,286 40% 951 139 Details 4.4.213 14,210 5,110 36% 834 124 Details 
In the above table, Series identifies the stable kernel that was looked at. Commits is the number of commits in that series, while Tags is the number and percentage of those commits with a Fixes tag. The count under Fixes is the number of commits in that series that are explicitly fixing another commit applied to that series. Reverts is the number of those fixes that were outright reverts; a famous person might once have said that reversion is the sincerest form of patch criticism. Hit the "Details" link for a list of the fixes found for each series.
Looking at those numbers would suggest that, for example, 3% of the commits in 5.4.18 are fixing other commits, so the bad commit rate would be a minimum of 3%. The situation is not actually that simple, though, for a few reasons. One of those is that a surprising number of the regression fixes appear in the same stable release as the commits they are fixing. In a case like that, while the first commit can indeed be said to have introduced a regression, no stable release actually contained the regression and no user will have ever run into it. Counting those is not entirely fair. If one subtracts out the same-release fixes, the results look like this:
Series Fixes Same 
releaseVisible 
regressions5.4.18 74 29 45 4.19.102 588 176 412 4.14.170 985 253 732 4.9.213 951 229 722 4.4.213 834 232 602 
Another question to keep in mind is what to do with all those commits without Fixes tags. Many of them are certainly fixes for bugs introduced in other patches, but nobody went to the trouble of figuring out how the bugs happened. If the numbers in the table above are taken as the total count of regressions in a stable series, that implies that none of the commits without Fixes tags are fixing regressions, which will surely lead to undercounting regression fixes overall. On the other hand, if one assumes that the untagged commits contain regression fixes in the same proportion as the tagged ones, the result could well be a count that is too high.
Perhaps the best thing that can be done is to look at both numbers, with a reasonable certainty that the truth lies somewhere between them:
Series Visible 
regressionsRegression rate Low High 5.4.18 45 1.9% 3.0% 4.19.102 412 3.5% 7.3% 4.14.170 732 4.7% 10.9% 4.9.213 722 4.6% 11.5% 4.4.213 602 4.2% 11.8% 
So that is about as good as the numbers are going to get, though there are still some oddball issues. Consider the case of mainline commit 4abb951b73ff ("ACPICA: AML interpreter: add region addresses in global list during initialization"). This commit included a "Cc: stable@vger.kernel.org" tag, so it was duly included (as commit 22083c028d0b) in the 4.19.2 release. It was then reverted in 4.19.3, with the complaint that it didn't actually fix a bug but did cause regressions. This same change returned in 4.19.6 after an explicit request. Then, two commits followed in 4.19.35: commit d4b4aeea5506 addressed a related issue and the original upstream commit in a Fixes tag, while f8053df634d4 claimed to be the original upstream commit, which had already been applied. That last one looks like a fix for a partially done backport. How does one try to account for a series of changes like that? Honestly, one doesn't even try.
So what can we conclude from all this repository digging? The regression rates seen in 2016 were quite a bit lower than what we are seeing now; that would suggest that the increasing volume of patches being applied to the stable trees is not just increasing the number of regressions, but also the rate of regressions. That is not a good sign. On the other hand, the amount of grumbling about stable regressions seems to have dropped recently. Perhaps that's just because people have gotten used to the situation. Or perhaps the worst problems, such as filesystem-destroying regressions, are no longer getting through, while the problems that do slip through are relatively minor.
Newer kernels have a visibly lower regression rate than the older ones. There are two equally plausible explanations for that. Perhaps the process of selecting patches for stable backporting is getting better, and fewer regressions are being introduced than were before. Or perhaps those kernels just haven't been around for long enough for all of the regressions already introduced to be found and fixed yet. The 2016 article looked at 4.4.14, which had 39 regression fixes (19 fixed in the same release). 4.4.213 now contains 110 fixes for regressions introduced in 4.4.14 or earlier (still 19 fixed in the same release). So there is ample reason to believe that the regression rate in 5.4.18 is higher than indicated above.
In any case, it seems clear that the push to get more and more fixes into
the stable trees is unlikely to go away anytime soon.  And perhaps that is
a good thing; a stable tree with thousands of fixes and a few regressions
may still be far more stable than one without all those patches.  Even so,
it would be good to keep an eye on the regression rate; if that is allowed
to get too high, the result is likely to be users moving away from stable
updates, which is definitely not the desired result.
| Index entries for this article | |
|---|---|
| Kernel | Development model/Stable tree | 
      Posted Feb 13, 2020 17:35 UTC (Thu)
                               by arjan (subscriber, #36785)
                              [Link] (2 responses)
       
 
     
    
      Posted Feb 13, 2020 17:59 UTC (Thu)
                               by josh (subscriber, #17465)
                              [Link] (1 responses)
       
     
    
      Posted Feb 13, 2020 18:29 UTC (Thu)
                               by arjan (subscriber, #36785)
                              [Link] 
       
maybe the analysis to answer that is how many regressions are in the early stable numbers (so .1 to say .20 or whatever) compared to higher number last digits 
     
      Posted Feb 14, 2020 4:44 UTC (Fri)
                               by sashal (✭ supporter ✭, #81842)
                              [Link] (6 responses)
       
Either we're getting better at finding bugs (and we are!), or people are getting more disciplined about tagging commits with the Fixes: tag, but consider the following: 
$ git log --oneline --no-merges -i --grep "fixes:" v4.4..v4.9 | wc -l 
So while only 7.3% of the commits between 4.4 and 4.9 had a Fixes: tag, we see that rate jump to %12.3 of the commits between 4.14 and 4.19, and again jump to %15 between 5.0 and 5.5 - more than double(!) of what we've been seeing between 4.4 and 4.9. 
I'd argue that if we're seeing an increase of Fixes: tags upstream, we're bound to see a similar increase in stable trees, even if the actual regression rate in stable trees has remained the same (or have gone down - which can explain your observation regarding less grumblings :) ). 
     
    
      Posted Feb 14, 2020 7:43 UTC (Fri)
                               by cpitrat (subscriber, #116459)
                              [Link] (5 responses)
       
     
    
      Posted Feb 14, 2020 14:43 UTC (Fri)
                               by sashal (✭ supporter ✭, #81842)
                              [Link] (4 responses)
       
An interesting comparison might be to analyze how many upstream Fixes: tags fix something from a "current" merge window vs older release. 
     
    
      Posted Feb 14, 2020 21:21 UTC (Fri)
                               by tytso (subscriber, #9993)
                              [Link] (3 responses)
       
It would seem to me that a really interesting thing to do would be to identify those commits in stable kernels which caused a regression (e.g., which had a commit which had an applicable fixes line later on), and see if we can identify any kind of machine learning features for commits that are likely to be problematic, and perhaps use that to delay the length of time between when a commit which might be at risk of introducing an a regression lands in Linus's tree, and when it gets picked up by a stable branch.    
     
    
      Posted Feb 14, 2020 21:38 UTC (Fri)
                               by sashal (✭ supporter ✭, #81842)
                              [Link] (2 responses)
       
With regards to your question, I've actually looked into that and did a talk last year (LWN covered it here: https://lwn.net/Articles/803695/). Based on the results, it seemed to me that letting -rc patches (and especially late -rc cycle patches) spend more time in -next would be valuable as those tend to be buggy. 
I raised it with Linus at the Maintainer's summit (https://lwn.net/Articles/799219/): "Sasha Levin asked about whether the same sort of checking happens after -rc1 comes out; the answer was "generally not". Code entering the mainline after the merge window is supposed to be limited to important fixes, and linux-next is less useful for those. As far as Torvalds is concerned, fixes that do not appear in linux-next are not an issue at all. Levin protested that fixes are often broken; putting them in linux-next at least gives continuous-integration systems a chance to find the problems". 
So Linus is just fine with taking patches during -rc cycle that weren't in -next even for a single day, and he isn't too interested in changing that. 
     
    
      Posted Feb 14, 2020 22:17 UTC (Fri)
                               by tytso (subscriber, #9993)
                              [Link] (1 responses)
       
But if there is something we can do to decrease the bug introduction rate, that would certainly be a good thing.    And that's why I'm suggesting that if we can use ML to figure out which commits contain bug fixes, maybe there is a way that we can use a training set of commits that landed in the stable kernels *and* which apparently had regressions, and see if we can find some features that tell us that those commits should get more careful screening.  Whether that's "wait longer", or create a list of commits that can be sent around  for humans to take a closer look, I don't have any concrete proposals, because I'm not sure what's the best way thing we could do with that information.  But I think it's worth some consideration and reflection to see if there's something we can do to further leverage ML; not just to select commits, but to flag commits for special care/handling/testing/review. 
Finally, Sasha, please don't take this as a criticism of the job you are currently doing.   Bugs and regressions in Linus's tree are inevitable; that's why we have thousnads of commits flowing into the stable kernels.   But this also means that bugs caused by bug fixes are also inevitable, and so the question is there something we can do to improve the process to deal with the fact that we are all humans.   Trying to improve any kind of development or ops processes are best done in a blame-free manner. 
 
     
    
      Posted Feb 15, 2020 1:29 UTC (Sat)
                               by sashal (✭ supporter ✭, #81842)
                              [Link] 
       
There is more information about the work here: https://lwn.net/Articles/753329/ . 
 
 
     
      Posted Feb 20, 2020 21:27 UTC (Thu)
                               by smfrench (subscriber, #124116)
                              [Link] 
       
     
    Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
4912
$ git log --oneline --no-merges v4.4..v4.9 | wc -l
67476
$ git log --oneline --no-merges -i --grep "fixes:" v4.14..v4.19 | wc -l
8562
$ git log --oneline --no-merges v4.14..v4.19 | wc -l
69363
$ git log --oneline --no-merges -i --grep "fixes:" v5.0..v5.5 | wc -l
10635
$ git log --oneline --no-merges v5.0..v5.5 | wc -l
70632
Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
Revisiting stable-kernel regressions
      
 
           