Bisection divides users and developers

By Jonathan Corbet
April 15, 2008

The last couple of years have seen a renewed push within the kernel community to avoid regressions. When a patch is found to have broken something that used to work, a fix must be merged or the offending patch will be removed from the kernel. It's a straightforward and logical idea, but there's one little problem: when a kernel series includes over 12,000 changesets (as 2.6.25 does), how does one find the patch which caused the problem? Sometimes it will be obvious, but, for other problems, there are literally thousands of patches which could be the source of the regression. Digging through all of those patches in search of a bug can be a needle-in-the-haystack sort of proposition.

One of the many nice tools offered by the git source code management system is called "bisect." The bisect feature helps the user perform a binary search through a range of patches until the one containing the bug is found. All that is needed is to specify the most recent kernel which is known to work (2.6.24, say), and the oldest kernel which is broken (2.6.25-rc9, perhaps), and the bisect feature will check out a version of the kernel at the midpoint between those two. Finding that midpoint is non-trivial, since, in git, the stream of patches is not a simple line. But that's the sort of task we keep computers around for. Once the midpoint kernel has been generated, the person chasing the bug can build and test it, then tell git whether it exhibits the bug or not. A kernel at the new midpoint will be produced, and the process continues. With bisect, the problematic patch can be found in a maximum of a dozen or so compile-boot-test cycles.

Bisect is not a perfect tool. If patch submitters are not careful, bisect can create a broken kernel when it splits a patch series. The patch which causes a bug to manifest itself may not be the one which introduced the bug. In the worst case, a developer may merge a long series of patches, finishing with one brief change which enables all the code added previously; in this case, bisect will find the final patch, which will only be marginally useful. If the person reporting the bug is running a distributor's kernel, it may be hard to get that kernel in a form which is amenable to the bisection process. Bisection might require unacceptable downtime on the only (production) system which is affected by the bug. And, of course, the process of checking out, building, booting, and testing a dozen kernels is not something which one fits into a coffee break. It requires a certain determination on the part of the tester and quite a bit of time.

All of the points above would suggest that requesting a bisection from a user reporting a bug should be done as a last resort. In that context, it is worth looking at the story of a recent bug report which suggests that some observers, at least, think that kernel developers are relying a little too heavily on this tool. An April 9, Mark Lord reported a regression in the networking stack; after making a couple of guesses, the network developers suggested that the problem be bisected.

Mark replied that he did not have the time to go through a full bisection, and that he would much rather be provided a list of commits which might be at fault. That list was not forthcoming, though; there were no developers who had an idea of where the problem might be and, as it turns out, the developer who introduced the bug lives in a time zone which caused him to miss the discussion. Mark's response was strong:

Years ago, Linus suggested that he opposed an in-kernel debugger mainly because he preferred that we *think* more about the problems, rather than just finding/fixing symptoms. This 100% reliance upon git-bisect is worse than that. It has people now just tossing regressions into the code left and right, knowing that they can toss all of the testing back at the poor folks whose systems end up not working.

Andrew Morton also worries that developers resort too quickly to a bisection request rather than working with users as was once done. Either that, he says, or developers just ignore the report from the beginning.

Other developers have answers to these worries, of course. Kernel developers often are not in a position to reproduce a reported bug; it may depend on the specifics of the user's hardware or workload. So they must depend on the user to try things and inform them when a change fixes the problem. Here's David Miller's view on how things used to work:

In fact, this is what Andrew's so-called "back and forth with the bug reporter" used to mainly consist of. Asking the user to try this patch or that patch, which most of the time were reverts of suspect changes. Which, surprise surprise, means we were spending lots of time bisecting things by hand.

We're able to automate this now and it's not a bad thing.

The other answer that one hears is that the situation now is much different, with far more users, much more code, and more problems to deal with. The old "back and forth" mode was better suited to smaller user and developer communities; in the current world, things must be done differently. David Miller again:

What people don't get is that this is a situation where the "end node principle" applies. When you have limited resources (here: developers) you don't push the bulk of the burden upon them. Instead you push things out to the resource you have a lot of, the end nodes (here: users), so that the situation actually scales.

There is another aspect of the problem which is spoken about a bit less frequently: developers must prioritize bug reports and decide which ones to work on. Unlike some projects, the kernel does not have anybody serving in any sort of bug triage role, so, in the absence of a disgruntled and paying customer, most developers make their own decisions on which problems to try to solve. It should not be surprising that problems with the most complete information are the ones which are most likely to be addressed first.

A bug report with a bisection that fingers a specific commit is a report with very good information, one which is generally easy to resolve. As an example, consider Mark Lord's report again; he did eventually take the time (five hours, apparently) to bisect the problem and report the results; the bug was found and fixed almost immediately thereafter - despite the fact that the responsible developer was still sleeping on the other side of the planet.

Even less spoken about is the fact that quite a few problems are one-off occurrences. Somewhere out there in the world, there is a single user who, due to a highly uncommon mixture of hardware and software, experiences a problem which affects (almost) nobody else. Marginal hardware, out-of-tree patches, and overclocking only make the problem worse. Arjan van de Ven's kernel oops summaries are illustrative in this regard; the statistics for the 2.6.25-rc kernels show that a half-dozen problems account for over half of the reports, while the vast majority of oopses have only a single occurrence.

Kernel developers have learned that this kind of problem report tends to go away by itself; the affected user finds a way around the issue (or just gives up) and nobody else ever complains. One can well argue that trying to chase down this kind of problem is not a good use of a kernel developer's time. The hard part is figuring out which reports are of this variety. One relatively straightforward way is to wait until reports from other users confirm the problem - or until a sufficiently determined user bisects the problem and provides a commit ID. In this sense, bisection serves as a sort of triage mechanism which requires users to perform enough work to show that the problem is real.

So the developers do have very good reasons for requesting bisections from users. That said, there is reason to worry that many users will simply stop sending in bug reports. If the only response they can expect is a bisection request (which they may be in no position to answer), they may see no point in reporting bugs at all. Fewer bug reports is not the path toward more solid kernel releases. So, as useful as it is, bisection will have to be a tool of last resort in most cases. The good news is that the development community does seem to understand that; bisection remains just one of the many tools we have for the isolation and solution of problems.

The not-quite-so-good news is that, as Al Viro and James Morris have pointed out, the real problem is in the review of code so that fewer bugs are created in the first place. That is not a problem which can be solved with bisection.

Index entries for this article
Kernel	Debugging
Kernel	Development tools/Kernel debugging
Kernel	Git

Bisection divides users and developers

Posted Apr 15, 2008 20:31 UTC (Tue) by jwb (guest, #15467) [Link] (3 responses)

I think that, in general, developers these days expect far too much work on the part of the
user.  I reported a bug against the intel xorg driver package in Ubuntu.  They had imported
some changes from upstream which broke any laptop with a GM965 graphics chip.  I narrowed the
result down to two candidate changes, and the package maintainer still marked my bug as
"incomplete" because, I guess, I didn't narrow it down to _one_ patch.  In other experiences I
have come across projects that expect you to test and report against the tip of the source
tree, even if there's no reason to believe that anything in the tip addresses the problem you
are reporting.  These types of actions are understandable defensive moves on the part of the
developers, but to the user they are off-putting and onerous.

Bisection divides users and developers

Posted Apr 15, 2008 21:06 UTC (Tue) by arjan (subscriber, #36785) [Link]

If you're unhappy with how your distro provides support, that realistically is between you and
your distro.

The upstream project has to draw a line somewhere. I totally agree that blindly asking for
"please test the tip" isn't the right thing, that's just pushing people away for now. At the
same time, if someone, say, reports a bug in the 2.6.9 kernel, it's also not realistic for
kernel developers to work on that. I consider it a reasonable request to the user to at least
use the last or last-but-one released versions; if you're using something earlier it can mean
pretty much two things:
1) you rolled your own - you should be able to roll a more recent version
2) you're using a distro package and don't know how to use a newer version - you should see if
the distro support can help you

Most healthy projects move so fast that a 2 year old version is no longer useful for the
developers to spend time on. This is part of the prioritization thing the articile mentioned:
as developer you end up spending your debug time on those reports which have the highest value
for the time invested. That is a combination of
1) a sufficiently diagnosed bug
2) a bug that hits many people
3) a bug that has a high probability of being unfixed still
   (and the fix being applicable to your development codebase)
4) a bug that can easily be reproduced

The more vectors a bug scores on, the more likely a developer will spend time on the bug. And
that's ok in my view...

Bisection divides users and developers

Posted Apr 15, 2008 21:52 UTC (Tue) by epa (subscriber, #39769) [Link]

I guess there's a difference between a bug report and a support request.

Clearly if a bug has been found, a bug report explaining how to reproduce it is not
incomplete.  All you need is instructions on how to reproduce the behaviour, and evidence from
documentation (or from wise people) that it is indeed a bug.

However if you expect something to be done to fix the bug, you have to rely on someone being
motivated to fix it.  That could be the project maintainer as a labour of love, or it could be
someone you pay for support.  Or if you are not paying cash, you may be expected to do some of
the work yourself, for example running git bisect.

Similarly, if the ancient foo-1.2 release is still being 'maintained', then any bug report
against that version is valid.  But to get support you may be expected to put in some work
yourself checking out the very latest code.  I agree that this can be offputting and some
projects are surely losing out on help they might get from users, by making the users jump
through too many hoops.

Bisection divides users and developers

Posted Apr 24, 2008 7:11 UTC (Thu) by jmspeex (subscriber, #51639) [Link]

I've had similar experience even dealing directly with vanilla kernels (full story at
http://kerneltrap.org/Quote/Quality_of_the_Bug_Report ). Long story short, despite working
pretty hard to pinpoint a regression that took days to reproduce, no developer even bothered
to have a look at what could have been broken.

How about a distro-provided bisection facility?

Posted Apr 15, 2008 20:46 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (6 responses)

Let's say you're a distro, and a user complains that your shiny, newly released kernel has a major regression. Why couldn't the distro itself provide a bisection-generation facility? This could be some combination of pre-built bisections (maybe for the first 2-3 cuts) and nice packaging to automate bisection generation. Ideally the new kernels could be tested in the context of a live CD distribution, to minimize the risks from running unstable kernels.

One could even conceive of a special kernel-testing distro that would run off of a live CD and automate the whole process. The CD (which might be on a USB flash device instead) would just iterate the following process, and the user would only need to wait for the compiles and test for the bug when prompted:

newest_good_kernel = what_I_was_running_before;
oldest_bad_kernel = what_you_shipped_me;
while (more_than_one_rev_between(newest_good_kernel,oldest_bad_kernel)) {
    midpoint_kernel = git_bisect(newest_good_kernel,oldest_bad_kernel);
    if (big_pipe && someone_has_built(midpoint_kernel))
        download(midpoint_kernel);
    else
        compile(midpoint_kernel);
    reboot_and_fire_up(midpoint_kernel);
    tell_user_to_test;
    if (user_says_the_bug_is_still_there)
        oldest_bad_kernel = midpoint_kernel;        
    else
        newest_good_kernel = midpoint_kernel;
    reboot_and_fire_up(known_good_kernel); // for the next build    
}
bad_patch = compute_diff(newest_good_kernel, oldest_bad_kernel);
send_report(bad_patch, user_comments);

Some types of bugs, such as file system corruption showing up after a while, would be trickier to test for, and the live CD would have to be able to ask for scratch media, be able to reset it to a known state, etc. But if testing can be made easier for interested users, we'll get more testing.

How about a distro-provided bisection facility?

Posted Apr 15, 2008 21:58 UTC (Tue) by jd (guest, #26381) [Link] (1 responses)

Some distros (Red Hat and SuSE spring to mind) are big enough that some (but not all) bisecting could actually be done automagically on a server at the distro's HQ. I'm picturing something like this:

Reasonably tech-savvy user finds repeatable kernel bug or regression
Said user is able to produce a sequence of events that lead up to the bug, plus the test that establishes the presence of the bug or regression
The script is handed off to a virtual machine at the distro HQ, along with the .config file
The script is validated by a human, to prevent accidental or deliberate DoS
The VM builds a test kernel, applies the script and checks against the test
If the test shows the bug is repeatable on the distro's hardware, the VM uses bisection and the prior step to automatically locate the bug
If the bug is in distro-supplied or distro-modified patches, the bug report goes to the distro, otherwise it's handed off to the kernel developers

This method has several advantages. Firstly, if the bug can be easily repeated, it moves the heavy lifting from users to people who (usually) have more powerful hardware at their disposal. Secondly, by distinguishing hardware-specific and hardware-agnostic bugs, there is automatically more information available for debugging. Thirdly, you really want to get to the final destination of having a way of reporting and filtering bug reports that maximizes both the quantity and quality of what kernel developers get, which means the manual parts have to be minimal and reducable by automation.

It also has several disadvantages. More users can bisect than can produce an automatable test plan. It's far harder for an automated system to eliminate non-identical reports that are of the same bug and carry no additional information. Too many automated bug reports may lead to developers ignoring them - and a bug in the bug reporter itself certainly would. So few distros can afford the hardware that would be required to do this well that it would have limited benefit. By necessarily using such high-end hardware, as opposed to what users are likely to have, a lot of hardware-related (and almost all hardware-specific) bugs - which, beween the two, will account for a sizeable fraction of all bugs - cannot be automatically bisected on a remote machine. Automated reporting systems cannot answer additional kernel developer questions or carry out additional testing onthe developers' behalf.

Ultimately, the question becomes one of how to get the most results from the most testing, given that testing is something programmers generally avoid if possible and the users most likely to do something funky enough to cause a crash are the ones who don't know what they're doing. The semi-automated method above won't solve that last one, though.

How about a distro-provided bisection facility?

Posted Apr 15, 2008 23:05 UTC (Tue) by JoeBuck (subscriber, #2330) [Link]

Certainly this is possible (to have people at the distros do the bisection), and in fact it's already being done, but many kernel bugs that escape into released kernels aren't noticed because they only affect users with very specific hardware. And these, I think, are exactly the cases where kernel developers ask the end user to bisect it, because they have no way on their end to make progress.

How about a distro-provided bisection facility?

Posted Apr 16, 2008 11:29 UTC (Wed) by mjthayer (guest, #39183) [Link] (3 responses)

Would the kernel revisions really have to be distributed as source?  Perhaps they could be
distributed as pre-compiled object files.  This would be much quicker for testing, might do
away with the need for scratch space (if there was enough RAM available for linking), and
space could still be saved by only including the object files which had actually changed since
the last revision in a particular revision on the live CD.  kexec could be used to load the
newly linked kernel.

How about a distro-provided bisection facility?

Posted Apr 16, 2008 11:47 UTC (Wed) by mjthayer (guest, #39183) [Link] (2 responses)

Or perhaps I am thinking too complicated - some sort of binary diffs should do the trick just
as well or better.

How about a distro-provided bisection facility?

Posted Apr 16, 2008 14:05 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Both of these have the problem that kernel configurations are wildly variable and capable of
enormous variation. This would only be practical for a limited set of distro-compiled kernels
(thus with known .configs), but it might work for them, and that would still be quite useful
(I guess it's more likely that someone who can build their own kernel can bisect it for bugs
as well).

How about a distro-provided bisection facility?

Posted Apr 16, 2008 14:10 UTC (Wed) by mjthayer (guest, #39183) [Link]

I think that the original poster was indeed talking about distribution kernels.

Top issues for 2.6.25-rc with annotations

Posted Apr 15, 2008 20:51 UTC (Tue) by arjan (subscriber, #36785) [Link] (2 responses)

at http://www.kerneloops.org/twentyfive.html I'm trying to add annotation to the various
issues to show fixed/unfixed/external patch etc, as well as a very short description of what
the issue is.

We are fixing the big ones at least.... that to me means we're at least doing something right
in terms of quality.

Top issues for 2.6.25-rc with annotations

Posted Apr 15, 2008 21:33 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

Fedora 9 will install kerneloops package by default which should give a nice or (not quite
nice depending on your viewpoint) boost to the stats. It is now in most mainstream distros. 

# yum install kerneloops

Have fun.

Top issues for 2.6.25-rc with annotations

Posted Apr 16, 2008 14:39 UTC (Wed) by willy (subscriber, #9762) [Link]

We've just been discussing installing kerneloops in Debian by default.  See
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=475398 (looks like it'll happen for Lenny)

Mark's response was strong

Posted Apr 16, 2008 0:28 UTC (Wed) by clugstj (subscriber, #4020) [Link] (6 responses)

Mark's response was hyperbole ("This 100% reliance on git-bisect").  The fact is, he was
running an unreleased kernel and expected unpaid volunteers to solve his problem.

Any tool that allows users to narrow down a bug is a good thing - whether or not all/some of
them will think it is worth their while to use the tool.

Mark's response was strong

Posted Apr 16, 2008 13:44 UTC (Wed) by kirkengaard (guest, #15022) [Link] (5 responses)

He, too, is an unpaid volunteer. Last I checked, linux-kernel wasn't a client of Real-Time
Remedies Inc, and this seems to have occurred while hobby-hacking on his empeg devices. Once
upon a time, that was an unthinkable distinction -- we were all unpaid volunteers. In this
case, it would be preferable for you to blow that suggestion out of some other orifice. This
is not a corporate "do our work for us" request, this is hacker-to-hacker.

Your following comment is dangerously akin to suggesting that he got what he deserved for
running an unreleased kernel, and that his laziness is the root of the problem.

In the thread process, it is quite obvious that he did test a wide range of kernels (i.e.
2.6.11-2.6.24), and that he did observe the mass number of commit changes around the relevant
close() code in the networking stack. He also provided excellent troubleshooting of the
problem, tracing the error down to exactly what happened (i.e. premature reset of the
connection on close()).

Once bisect was suggested, <http://thread.gmane.org/gmane.linux.kernel/663422>, it became the
solution. Nobody had an answer off the top of their heads -- or informed Mark that the
relevant developer was asleep -- and it became "here, find the rest of the information and
give it to us." Thus the argument about bug reporting being a two-way street, and the
suggestion that Mark expected it to be a one-way street -- ignoring the work he had done
already to report the bug in a very thorough manner. From here, the flamewar threshold was
crossed in short order.

Having the time is the issue. Assuming the timestamps are valid for estimation purpose, the
report was filed at 6:56, and his "If I had the time right now, maybe." comment was at 21:05.
Between, he posted four times, each with more information from his bug-tracking work. That's a
lot of work product.

Be careful about your assumptions when you make off-the-cuff remarks like that. Mark's
response was strong, but not unjustified. This is the way the community has worked in the
past, and the impression he got of "(shrug) Dunno, go bisect." is not hard to see.

Mark's response was strong

Posted Apr 16, 2008 14:59 UTC (Wed) by mb (subscriber, #50428) [Link] (2 responses)

> Having the time is the issue. Assuming the timestamps are valid for estimation purpose, the
> report was filed at 6:56, and his "If I had the time right now, maybe." comment was at
21:05.
> Between, he posted four times, each with more information from his bug-tracking work. That's
a
> lot of work product.

In that time he could easily have done a complete bisect instead.
bisect saves time for developers _and_ users.

Mark's response was strong

Posted Apr 16, 2008 17:39 UTC (Wed) by bronson (subscriber, #4806) [Link] (1 responses)

You're ignoring Mark's point.  I think he was right to push back a little.

If the automatic first response of developers is "go bisect it!" then that doesn't save time
for anybody.  Most bugs don't need a full bisection and many bugs won't bisect well well (as
noted by the article).

Both parties in this discussion had excellent points.  Ideally devs will have to compromise a
little by considering the bug report for 30 sec to reduce wild goose chases and making users
feel like they're getting the runaround.  Users will have to compromise a little more because
they scale.

In an ideal world.  :)

Mark's response was strong

Posted Apr 16, 2008 17:58 UTC (Wed) by mb (subscriber, #50428) [Link]

> If the automatic first response of developers is "go bisect it!" then that doesn't save time
> for anybody.  Most bugs don't need a full bisection and many bugs won't bisect well well (as
> noted by the article).

Ok, point accepted. :)

Mark's response was strong

Posted Apr 17, 2008 0:30 UTC (Thu) by clugstj (subscriber, #4020) [Link] (1 responses)

I wasn't being off-the-cuff.  After 14 hours (by your estimation) his bug wasn't fixed and he
goes on a rant?  Seems a bit excessive to me.

Mark's response was strong

Posted Apr 17, 2008 5:18 UTC (Thu) by dlang (guest, #313) [Link]

he wasn't complaining that the bug didn't get fixed in 14 hours, he was complaining that the
attitude of the developers seemed to be "we won't look at the problem until you bisect it"

Bisection divides users and developers

Posted Apr 16, 2008 4:24 UTC (Wed) by imcdnzl (guest, #28899) [Link]

A point was made in the article about how the bisection might make an unworkable or
uncompilable kernel - something I have had personally a few times. In one of the latest
versions of git (can't remember version sorry) you can make a kernel as unusable after a
bisect and it will then go and get another bisection point.

Bisection divides users and developers

Posted Apr 17, 2008 9:05 UTC (Thu) by dmk (guest, #50141) [Link]

A little bit offtopic, but also mentioned in the Thread (in the "the real problem is not this,
but:" way of threadhijacking) was the lack of reviewers, and al viro suggested some kind of
independent "per-subsystem-reviews". 

I think this an excellent idea would be some kind of "this month is the big "we all review
thisandthat area of the kernel" month!" new-wave PR-thingy!

maybe hosted by the kernelnewbies oder janitors...

this could be specially mentored by the responsible developers.

I think the linux-kernel could benefit from something like that.

Bisection divides users and developers

Posted Apr 17, 2008 18:08 UTC (Thu) by appie (guest, #34002) [Link]

All I can think of: how can any developer require a user to grok using git at all. The amount
of people savvy enough to do a bisect will be very very small.
Having someone actually reporting a bug is probably the tip of the iceberg.
Volunteering and working in your own time, but if one contributes buggy code, one should
either facilitate in debugging and fixing it or not submitting it in the first place.
It's in everyone's interest not to piss off or push away participation from the
non-(kernel)-hackers section of the FOSS community.
I'm not quite sure if it would be feasible, but having a repository of installable kernels in
various staged (i.e. patches applied) of the process would help.

Bisection has other problems

Posted Apr 30, 2008 15:11 UTC (Wed) by eliezert (subscriber, #35757) [Link]

The requirement that patch sets don't break bisection can make things very hard on driver
maintainers.

Lets say I have replaced 30% of my driver code with newer, better code.
It's extremely hard to break the changes into separate patches that are 
logically separate, so that they can be reviewed one one hand, and on the other, none of them
break anything, so bisection works.

Maybe the way to solve this is to have bisection by patch-sets rather than by individual
patches.