Stochastic bisection in Git
Bisection looks for problem commits using a binary search. The developer identifies the latest known good commit with git bisect good and the earliest known commit showing the bug with git bisect bad. Git will then find a commit near the midpoint between the two, check out that commit, and wait for the developer to try to reproduce the bug. Another git bisect command is used to mark the commit as "good" or "bad", and the process repeats, dividing the range of commits in half each time, until only one commit remains. That commit must be the one that introduced the bug in question.
This technique can be powerful. A bug introduced in a 12,000-commit kernel merge window can be narrowed to a single commit in 14 bisect cycles, which makes the process of finding the actual bug much easier. But it works less well when dealing with bugs that are difficult to reproduce and which, thus, may not manifest in any given testing cycle. A 14-step bisection is 14 opportunities for the developer to provide an incorrect result, and it only takes one such to throw the entire process off. It is not uncommon to see nonsensical bisection results posted to mailing lists; they are often caused by just this kind of problem.
Stochastic bisection is a way of trying to adapt the bisection process to situations where it is not 100% clear whether a bug is present at a given point in the history. It can't say for sure which commit is responsible for the bug, but it can find the commit that is probably at fault. To use this feature, a developer will supply the desired confidence level (as a number between zero and one) with the --confidence option to git bisect start. The process will then continue until one commit is identified as being the most likely source of the bug with the requested confidence level.
From there, the bisection process starts as usual, picking a commit in the middle of the specified range. With each test, though, the developer will indicate their confidence in the result they report with the same --confidence flag. A commit that exhibits the bug is almost certainly bad, so a command like:
git bisect bad --confidence 1.0
would be appropriate. Tests that fail to show the bug may be more ambiguous, though; perhaps the race condition that causes the bug to manifest just didn't trigger this time. If the developer suspects that the bug might be present even though the test indicated otherwise, they can specify a lower confidence level.
Kara included an example in the series cover letter showing how this process would work. It starts with a commit history looking like this:
A--B--C--D-----F-----H--------K
\ \ \-E-/ / /
\ \--------G-/ /
\------------------I--J-/
The problematic commit is I, but the developer only knows that A works and K occasionally shows the problem. They go through the steps, reporting all results (some of which are incorrect) with a confidence level of 0.7; eventually Git reports that commit I is the likely culprit with the requested confidence level of 0.9.
One thing from the example will jump out at developers who have used bisection in the past. A normal bisection process will not present the same commit to the developer for testing twice; once a verdict has been rendered on any given commit, it will not change. Stochastic bisection, instead, will ask the developer to rerun tests on commits that have been seen before, sometimes multiple times in a row. The result is a rather longer bisection process. Given the 11-commit history shown above, a normal bisection would be done after four steps; the stochastic bisection process, instead, required 14 steps. The relative expansion of a bisection over a larger range of commits is likely to be significantly less, but it will still make the process longer. A longer process is better than one that yields a nonsense result, though.
Perhaps the most substantive comment in response to this posting came from Taylor Blau. Kara's patches try to share as much logic as possible between normal and stochastic bisection, which is a worthy goal. In the process of joining the two, though, Kara caused the bisection points reached in some of the Git tests to change; the new points are just as valid, but they are different, so Kara had to change the tests as well. Blau would rather see the new feature added without any behavior changes for developers who are not using it. Kara responded that he feels that his solution is the right way to do things, but he also expressed a willingness to change things if the Git developers request it. Nobody has given him an answer on that question at this point.
There are other changes that will be needed, of course, as is usually the
case for a complex patch set, so there should be at least one revised
version posted to the list at some point. Stochastic bisection will not be
useful for all developers, but there are clearly situations where a more
probabilistic approach is required. It would thus be surprising if some
version of this work didn't eventually find its way into the Git mainline.
