Stochastic bisection in Git
Bisection looks for problem commits using a binary search. The developer identifies the latest known good commit with git bisect good and the earliest known commit showing the bug with git bisect bad. Git will then find a commit near the midpoint between the two, check out that commit, and wait for the developer to try to reproduce the bug. Another git bisect command is used to mark the commit as "good" or "bad", and the process repeats, dividing the range of commits in half each time, until only one commit remains. That commit must be the one that introduced the bug in question.
This technique can be powerful. A bug introduced in a 12,000-commit kernel merge window can be narrowed to a single commit in 14 bisect cycles, which makes the process of finding the actual bug much easier. But it works less well when dealing with bugs that are difficult to reproduce and which, thus, may not manifest in any given testing cycle. A 14-step bisection is 14 opportunities for the developer to provide an incorrect result, and it only takes one such to throw the entire process off. It is not uncommon to see nonsensical bisection results posted to mailing lists; they are often caused by just this kind of problem.
Stochastic bisection is a way of trying to adapt the bisection process to situations where it is not 100% clear whether a bug is present at a given point in the history. It can't say for sure which commit is responsible for the bug, but it can find the commit that is probably at fault. To use this feature, a developer will supply the desired confidence level (as a number between zero and one) with the --confidence option to git bisect start. The process will then continue until one commit is identified as being the most likely source of the bug with the requested confidence level.
From there, the bisection process starts as usual, picking a commit in the middle of the specified range. With each test, though, the developer will indicate their confidence in the result they report with the same --confidence flag. A commit that exhibits the bug is almost certainly bad, so a command like:
git bisect bad --confidence 1.0
would be appropriate. Tests that fail to show the bug may be more ambiguous, though; perhaps the race condition that causes the bug to manifest just didn't trigger this time. If the developer suspects that the bug might be present even though the test indicated otherwise, they can specify a lower confidence level.
Kara included an example in the series cover letter showing how this process would work. It starts with a commit history looking like this:
A--B--C--D-----F-----H--------K \ \ \-E-/ / / \ \--------G-/ / \------------------I--J-/
The problematic commit is I, but the developer only knows that A works and K occasionally shows the problem. They go through the steps, reporting all results (some of which are incorrect) with a confidence level of 0.7; eventually Git reports that commit I is the likely culprit with the requested confidence level of 0.9.
One thing from the example will jump out at developers who have used bisection in the past. A normal bisection process will not present the same commit to the developer for testing twice; once a verdict has been rendered on any given commit, it will not change. Stochastic bisection, instead, will ask the developer to rerun tests on commits that have been seen before, sometimes multiple times in a row. The result is a rather longer bisection process. Given the 11-commit history shown above, a normal bisection would be done after four steps; the stochastic bisection process, instead, required 14 steps. The relative expansion of a bisection over a larger range of commits is likely to be significantly less, but it will still make the process longer. A longer process is better than one that yields a nonsense result, though.
Perhaps the most substantive comment in response to this posting came from Taylor Blau. Kara's patches try to share as much logic as possible between normal and stochastic bisection, which is a worthy goal. In the process of joining the two, though, Kara caused the bisection points reached in some of the Git tests to change; the new points are just as valid, but they are different, so Kara had to change the tests as well. Blau would rather see the new feature added without any behavior changes for developers who are not using it. Kara responded that he feels that his solution is the right way to do things, but he also expressed a willingness to change things if the Git developers request it. Nobody has given him an answer on that question at this point.
There are other changes that will be needed, of course, as is usually the
case for a complex patch set, so there should be at least one revised
version posted to the list at some point. Stochastic bisection will not be
useful for all developers, but there are clearly situations where a more
probabilistic approach is required. It would thus be surprising if some
version of this work didn't eventually find its way into the Git mainline.
Posted Dec 10, 2021 17:06 UTC (Fri)
by atnot (subscriber, #124910)
[Link] (14 responses)
Posted Dec 10, 2021 20:05 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (13 responses)
The latter probability has two possible interpretations, which are both non-ideal:
1. The probability that, if we drew a random sample from the good/bad distribution, it would be at least as extreme/unlikely as the actual result. This is called the p value, and is widely used in (frequentist) statistics, but you have to be careful when using the p value because it is very easy to misinterpret as (2). A small p value may indicate that you can rule out a possibility, but a large p value tells you nothing useful. The same result might plausibly originate from either distribution, if the distributions significantly overlap, and in that case, the p values will be large for both distributions, which is obviously unhelpful. (If they don't overlap much or at all, then you can just use deterministic bisection together with "rerun the test a couple of times" and the whole problem is trivial.)
The reason we need a prior for (2) is complicated, but essentially boils down to this: We need to know which distribution was more likely to begin with. If a commit is just a simple refactor, with no changes in actual behavior, then that commit is much less likely to cause the regression than (say) a commit which completely rewrites the core algorithm, and the Bayesian interpretation says that we need to include that sort of information in our analysis in the form of prior probabilities. The problem is, there is no universally good way to determine such priors, because the whole point of a prior is that you should come up with it before you've actually looked at the performance data (that's why it's called a "prior" - it comes first). The other problem is that you can't just use 50% everywhere, because you have to update your beliefs across the entire chain of commits every time you do a test. So you could start with 50-50 priors for the first commit, but you had better factor in the outcome of the first test to your priors in subsequent tests, or else the posterior probabilities will be invalid. This is annoying, but not particularly hard in terms of actual theory.
Posted Dec 11, 2021 8:30 UTC (Sat)
by epa (subscriber, #39769)
[Link] (12 responses)
Posted Dec 14, 2021 13:33 UTC (Tue)
by geert (subscriber, #98403)
[Link] (11 responses)
The hardest bugs are related to changes that look trivial, non-suspicious, and passed both review and testing.
Note that if you know which driver or subsystem contains the buggy change, you can reduce the search space by passing a path to git bisect. I barely use that, though. I'd rather do a full "dumb and numb" bisect to pinpoint the real issue with (some) certainty, than try to be smart, only to discover my initial guess was wrong, and have to do the full bisect anyway later.
Posted Dec 14, 2021 17:03 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (4 responses)
(Sure, I could post these to the ML, but they're just as likely to rot as pie-in-the-sky ideas as here :) .)
[1] For anyone curious, there's new initialization behavior where modifying `sys.path` before `Py_Main` is nuked from orbit when `Py_Main(argc, argv)` reinitializes the interpreter state. You now need to give `argc` and `argv` via `PyConfig` and initialize that way and then use `Py_RunMain()` to avoid this. Available from 3.8, but the behavior change basically forces it for 3.10+.
Posted Dec 14, 2021 19:43 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
> Future work to say "crap, I gave some bad information" and rewind a few steps would help quite a bit.
git bisect log and git bisect replay already do exactly that.
> Another one is "please carry around this patch to anywhere that is jumped to because it needs fixed in order to test my problem, but is not the problem I'm trying to figure out right now". Sometimes you can just leave the diff in the working tree, but I'd rather `git bisect` know what it is for and give me a conflict resolution flow instead of just refusing to jump to the next candidate commit.
You can build it yourself out of git bisect run, although built-in functionality would be Nice To Have. But if you've already turned the test into a single command, writing a wrapper script that applies/reverts a fixed patch is not *that* much extra work.
Posted Dec 14, 2021 20:50 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Oh nifty. I knew of `log`, but had missed `replay`. However, I feel that `git bisect undo [n]` would also be useful (at least to avoid touching files all over the place in the process of restarting since I suspect that the replay is not done purely in-index quite yet[1]).
> You can build it yourself out of git bisect run, although built-in functionality would be Nice To Have. But if you've already turned the test into a single command, writing a wrapper script that applies/reverts a fixed patch is not *that* much extra work.
Sure, but that's generally a "hindsight is 20/20" situation. The problem is usually that such things are discovered in the process of bisecting, so by the time I have a robust reproducer script to give to `git bisect run`, I've already done a bisection…
[1] Willing to be pleasantly surprised though :) .
Posted Dec 14, 2021 21:24 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link]
#!/bin/bash
patch -p2 /path/to/file.diff # Replace 2 with the correct number.
You would still need to handle the "what if patch(1) fails?" case, though.
Posted Dec 23, 2021 17:54 UTC (Thu)
by zblaxell (subscriber, #26385)
[Link]
Speaking of rebase, that would be an awesome workflow for git bisect. Check out the commit to be tested, then cherry-pick a list of patches, with quick options to skip or take over manually (or leverage the git-rebase machinery). When a test result is received, rebase the list of patches to the next commit to be tested. Keep the rebase branch heads around in the log, so that later bisects over the same range don't need to have rerere done to them over and over.
I currently do this by having a script check out the revision chosen by bisect in a separate working tree, cherry-pick a stack of patches (and ignore patches that are known to fail if and only if they're not needed), then build and test that instead of what bisect has checked out. Sometimes I need to have different patches on different revision ranges but it doesn't happen often enough to bother automating it.
That's all assuming I don't give up on git bisect completely, and just take a flat list of commits and run the bisection algorithm myself. After implementing the editable log, the cherry-picking, and adding other conveniences like a place to store annotations about test results (admittedly this is most useful for recording statistical data to compute confidence in the results), getting a list of revisions to test and running a search algorithm against them is not that much extra work...and then what's left for git bisect to do?
Posted Dec 14, 2021 19:54 UTC (Tue)
by epa (subscriber, #39769)
[Link] (5 responses)
Posted Dec 14, 2021 22:32 UTC (Tue)
by nix (subscriber, #2304)
[Link] (4 responses)
Posted Dec 15, 2021 10:05 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (2 responses)
Or, in the grand tradition of Unix, invoke an external tool (configurable) to make the judgement call - git already knows about external diff and merge tools (which in theory can be language-specific), so an external tool to identify the "significance" of a change isn't a big stretch.
Posted Dec 15, 2021 16:37 UTC (Wed)
by nijhof (subscriber, #4034)
[Link] (1 responses)
Posted Dec 15, 2021 16:47 UTC (Wed)
by epa (subscriber, #39769)
[Link]
I have lots of commits like 'whitespace' or 'renamed variable' or 'changed from private to public' where I am only 99% sure. So I would like git bisect to mostly disregard them when picking its midpoint to test, but of course, if one of these commits is the only candidate between the current bad and good labels, it should be built and tested.
That will also speed up bisection by making the search tree shallower (assuming the prior probabilities or scores assigned to each commit are broadly reliable). Doing the normal bisection and then skipping an individual commit will give a smaller speedup.
Posted Dec 15, 2021 12:15 UTC (Wed)
by epa (subscriber, #39769)
[Link]
Posted Dec 10, 2021 17:46 UTC (Fri)
by MrWim (subscriber, #47432)
[Link] (1 responses)
> BBChop is like 'git bisect' (or equivalent), but works when your bug is intermittent. That is, it works in the presence of false negatives (when a version happens to work this time even though it contains the bug). It assumes that there are no false positives (in principle, the same approach would work, but adding it may be non-trivial).
I've not used BBChop, it's just something that I came across a while ago - but it seems relevant here.
Note also BBChop seems to have a great weakness:
> Unlike 'git bisect', bbchop is not integrated with a version control system. It has hooks to allow it to be integrated with a version control system
Posted Dec 10, 2021 19:33 UTC (Fri)
by JoeBuck (subscriber, #2330)
[Link]
Posted Dec 10, 2021 20:33 UTC (Fri)
by droundy (subscriber, #4559)
[Link]
Posted Dec 10, 2021 21:04 UTC (Fri)
by andy_shev (subscriber, #75870)
[Link] (1 responses)
Posted Dec 13, 2021 16:20 UTC (Mon)
by jengelh (guest, #33263)
[Link]
Posted Dec 11, 2021 3:11 UTC (Sat)
by david.a.wheeler (subscriber, #72896)
[Link]
Posted Dec 11, 2021 12:41 UTC (Sat)
by fsateler (subscriber, #65497)
[Link] (2 responses)
Posted Dec 12, 2021 14:19 UTC (Sun)
by gray_-_wolf (subscriber, #131074)
[Link]
Posted Dec 14, 2021 13:29 UTC (Tue)
by jan.kara (subscriber, #59161)
[Link]
Stochastic bisection in Git
Stochastic bisection in Git
2. The probability that the random sample was actually drawn from the good/bad distribution. This is obviously a lot closer to what the --confidence flag wants, but the problem with this probability is that it either doesn't exist at all (frequentist interpretation) or exists but must be calculated from a prior probability (Bayesian interpretation), which is a subjective and arbitrary input value.
Stochastic bisection in Git
Stochastic bisection in Git
Stochastic bisection in Git
Stochastic bisection in Git
Stochastic bisection in Git
Stochastic bisection in Git
git show -s --oneline
echo "Press ^D when finished testing this commit."
bash
echo -n "Was this commit (G)ood, (B)ad, or (S)kip? "
read REPLY
git reset --hard HEAD
case "$REPLY" in
g|good) exit 0;;
b|bad) exit 1;;
s|skip) exit 125;;
*) echo "Unrecognized input, aborting."; exit -1;;
esac
Stochastic bisection in Git
Stochastic bisection in Git
Stochastic bisection in Git
Stochastic bisection in Git
You could probably already do that today in git bisect run. In the command/script:
Stochastic bisection in Git
if <this commit is insignificant>; then
exit 125 # tell git bisect to skip this commit.
fi
Stochastic bisection in Git
Stochastic bisection in Git
Git itself can't know that information.
Stochastic bisection in Git
> ...
> BBChop is based on Bayesian Search Theory
(http://en.wikipedia.org/wiki/Bayesian_search_theory ).
Stochastic bisection in Git
Stochastic bisection in Git
Stochastic bisection in Git
Stochastic bisection in Git
Cool idea!
Stochastic bisection in Git
Stochastic bisection in Git
compute-time wise, it would make it possible to look even for very rare bugs
and just letting the bisect run over night with low confidence on the good
results..
Stochastic bisection in Git