Development issues part 2: Bug tracking
There comes a time, though, when even the most die-hard free software proponent wishes that things would just work. As our software finds its way into more situations where failures are unwelcome (at best), the level of tolerance for bugs is falling. The desire for fewer flaws, however, runs counter to the desire for increasingly capable (and thus more complex) software. Somehow we have to find ways to simultaneously grow our systems and reduce the total number of bugs. To this end, a few projects have been having some interesting discussions on the tracking and fixing of bugs.
As has been discussed in this companion article, Eric Raymond has been busily stirring up trouble on the Emacs development list. His point, deemed reasonable by your editor, is that Emacs must adopt a number of relatively modern development practices if it is to have any hope of remaining relevant at all. One of his key points is that Emacs needs to have a real bug tracking system. Says Eric:
While some of Eric's suggestions appear to be non-starters - imagine trying to get Richard Stallman to hang out on an IRC channel - the bug tracker suggestion might just go somewhere. Certainly it could only be an improvement for a project of that size to have some sort of idea of what the current list of outstanding bugs looks like. It might even help bring about another Emacs release before the end of the decade.
Bug trackers are not a magical solution to the bug problem, though; in fact, they can create some problems of their own. The Fedora project, which does have a bug tracker, is currently trying to figure out what to do with the contents of that tracker. It seems that said tracker contains over 13,000 bugs, almost 10,000 of which apply to Fedora 7 and later.
A bug database of this size is simply overwhelming to anybody who tries to do something about it. As a result, Fedora users are filing bugs, only to see nothing happen in response. Not even a "thanks for your report" message. This situation is discouraging for everybody involved, causing Fedora users to give up on reporting bugs and developers to fear looking at the tracker.
In the Fedora case, there appears to be a near-consensus that the biggest problem is in triaging bug entries. This is not a job which can be automated; somebody has to go through bug submissions, weed out the duplicates, identify those which are really "features," figure out which developer should be notified, etc. Tying bug entries to those found in upstream trackers would be a highly useful bonus. Without this sort of effort, the bug tracker quickly fills with low-quality entries which help nobody.
For the most part, nobody is doing this job for Fedora now. Red Hat is not paying for a staff member to triage bugs, and the wider community has not filled this gap. In the short term, any sort of solution looks like it will have to come from the community, so the Fedora folks are wondering what can be done to encourage more participation. Simply asking for help is the obvious first step, as is making sure that the process is easy. Then they may consider the tactics adopted by other large projects - Mozilla's policy of expressing its appreciation by sending a T-shirt, for example.
As an aside, one of the more useful bits of information to come from this discussion was the existence of this family of URLs:
http://bugz.fedoraproject.org/<package-name>
Fill in the name, and the result is an immediate list of open bugs for the given package. Thus, for example, a visit to bugz.fedoraproject.org/gcc yields a list of compiler bugs. This result can be had directly from bugzilla, of course, but this interface is faster and easier.
The Fedora developers have discussed a number of related issues, such as whether the Fedora bug database should be separated from the RHEL system and what can be done to make Red Hat better appreciate the value of doing more of its quality assurance work in the Fedora repository. But the core problem is just getting human attention applied to the bug reports. Digging through bug databases is a relatively unglamorous job; it is not an easy path toward rock-star hacker status. But it is an important and relatively easy way to help make free software better.
Just in time to serve as an example of how well bug management can work, the GNOME project has posted its annual bugzilla statistics. It seems that over 110,000 GNOME bugs were filed in 2007, almost 109,000 of them were closed. The top bug-closers for the year were:
14254 Andre Klapper 9800 Tom Parker 7047 Susana Pereira 6882 Bruno Boaventura 6649 Pedro Villavicencio
It is worth pondering for a moment on the amount of energy required to close over 14,000 bugs in a year - that's almost 40 per day, every day, without a break. This kind of energy does exist within our community, and some projects are putting it to very good use.
While it is easy to get a contrary impression, the kernel does, in fact, have a bug tracker; there is also, in the form of Natalie Protasevich, somebody who handles the care and feeding of that tracker. But, as a recent episode shows, that still is not always sufficient to actually get the bugs fixed.
On November 13, 2007, a bug in the SCSI subsystem was reported to the linux-kernel mailing list. It was put into the tracker as bug 9370 on the same day. Some developers looked at it over the next few days, but, even though a specific commit which appeared to cause the bug had been identified, no solution was forthcoming. Discussion eventually died out. At least until January 2, when Ingo Molnar decided to stir the pot by posting a patch to revert the seemingly guilty commit. At that point the discussion picked up and a reliable way of reproducing the bug was found. The commit which was said to have caused the problem was, in fact, not guilty; it had just caused an older bug to come to light. The discussion did not stop there, though.
A number of charges went back and forth which do not require discussion here. But one core point is this: as long as the bug report sat in the tracker, nothing much appeared to be happening with it - though, it seems, the SCSI developers had not forgotten it and were trying to figure out what was really going on. But once the problem came back to the linux-kernel list in the form of a brute-force solution, the root cause was found in short order. The key here was bringing the problem to the attention of a wider group of people; the crucial recipe for reproducing the problem came from a developer who had not been looking at the problem previously.
In the kernel context, at least, giving wide exposure to a bug often helps
immensely in getting that bug fixed. That is especially true for the sort
of hard-to-reproduce bugs which tend to come up in kernel programming. So,
while bug trackers are a useful tool for ensuring that problems do not fall
through the cracks, it seems that one of the most potent anti-bug tools we
have - discussing the problem via a widely-distributed email list - is the
same tool we have been using for decades.
Index entries for this article | |
---|---|
Kernel | Debugging |
Posted Jan 10, 2008 4:04 UTC (Thu)
by jengelh (guest, #33263)
[Link]
Posted Jan 10, 2008 7:52 UTC (Thu)
by pr1268 (guest, #24648)
[Link]
My view is that Fedora's defect tracking system has fallen victim to the Broken Windows Theory. With Fedora's Bugzilla lacking focused leadership, I'm starting to wonder just how long bugs remain in their defect tracking system... I suspect a lot of Fedora's 13K bug reports are duplicates, cosmetic, or "operator error" (I've had a few of those). ;-)
Posted Jan 10, 2008 8:21 UTC (Thu)
by Coren (guest, #39136)
[Link]
Posted Jan 10, 2008 11:15 UTC (Thu)
by tsr2 (subscriber, #4293)
[Link] (2 responses)
Posted Jan 10, 2008 19:05 UTC (Thu)
by kmccarty (subscriber, #12085)
[Link]
I've submitted a bug or two to Debian and KDE and the lack of any response makes me unlikely
to bother again, except for a real show stopper. It's worth noting that maintainers of popular packages like KDE or X are completely swamped in bugs. (I'm thinking of recent blog entries by Brice Goglin and David Nusinow, two of Debian's X maintainers.) Consider sending a polite "ping" to the bug, or even a patch to fix it if you can, and you might get more of a reaction -- surely it's at least worth a try!
Posted Jan 11, 2008 17:26 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
I do, as a user, find it to be quite useful in that case. Assuming the bug tracker has decent search capability, it tells me not to waste time reporting a bug that has already been reported.
Furthermore, in projects where most bugs don't get fixed, it saves me the time of reporting even a new bug. When I want to report a bug in a project with which I'm not familiar, I always first look at a recent sampling of bug reports (whether in a mailing list or formal tracker) to get a feel for whether the project actually fixes bugs or not, and if not, I don't waste my time.
Not fixing bugs can be as simple as ignoring the reports, but it can also be making excuses -- closing the issue with "it's fine the way it is."
Posted Jan 10, 2008 12:10 UTC (Thu)
by hjernemadsen (subscriber, #5676)
[Link]
Posted Jan 10, 2008 13:03 UTC (Thu)
by kirkengaard (guest, #15022)
[Link]
Posted Jan 10, 2008 13:18 UTC (Thu)
by massimiliano (subscriber, #3048)
[Link]
IMHO, a bug tracker is essential in any project larger than a few thousands lines of code (so, every project!), and with a lifetime longer than a few months. When these two measures are exceeded, it is virtually impossible for standard human beings to keep track of issues to be resolved...
And of course I also agree that the bug tracker itself is not enough: there must be a "QA team" that triages the bugs, makes sure they are valid, prioritizes them... in short, makes sure that the developers (the ones that can actually fix the bugs, and which are always a scarce resource) don't waste time navigating in the bug database, and focus on fixing them.
But there's something that nobody is pointing out: regression tests. In my experience, they are the key factor that helps in being sure that as the project grows, the number of bugs stays under control and does not grow exponentially. And having a build farm that continuously checks out the latest source, builds it and tests it on all the supported platforms, and reports the results in the most accessible way.
And, possibly, having the policy of refusing contributions that do not include automated regression tests.
I know this sounds a bit draconian, but it is the only way to be sure
that the tests have good coverage...
Posted Jan 10, 2008 13:48 UTC (Thu)
by Frej (guest, #4165)
[Link]
Posted Jan 10, 2008 17:18 UTC (Thu)
by iabervon (subscriber, #722)
[Link] (3 responses)
Posted Jan 11, 2008 17:14 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (1 responses)
I think a bug tracking system (or, more generally, a bug handling process) is a horrible way to deal with common user errors. I'd rather see effort put into fixing the product design and/or user documentation than providing a way in the bug reporting process for a user to find out that what appears to be a bug really isn't.
For example, I've found that simply producing quality error messages goes a long way toward that.
Posted Jan 11, 2008 21:48 UTC (Fri)
by iabervon (subscriber, #722)
[Link]
Posted Jan 19, 2008 14:45 UTC (Sat)
by fergal (guest, #602)
[Link]
One way to solve this (and several other problems) is to distinguish between bugs and bug reports. I've only ever seen one bug tracker that did this (written for the linux kernel but not adopted). Bugs and bug reports are different concepts and yet all popular bug trackers mush the 2 concepts together, leading crap like "closed as duplicate" and all the pain that comes from that. If you separate the two concepts, then users file reports, triagers either create a new bug or attach the report to an existing bug. Duplicate reports become harmless or even positive if they contain extra info. You can even attach a report to multiple bugs.
Posted Jan 11, 2008 12:27 UTC (Fri)
by mjthayer (guest, #39183)
[Link]
Posted Jan 22, 2008 1:10 UTC (Tue)
by dkite (guest, #4577)
[Link]
Development issues part 2: Bug tracking
The RH bugzilla is slow, and part of the problem is the Query mask where there is one entry
for each package. This gives a HUGE html file for browsers to parse. This would really make me
want to use the search form less and less (if I actually used FC more, where I would run into
bugs that block my task), and let some developer see if it is a duplicate.
Development issues part 2: Bug tracking
Development issues part 2: Bug tracking
"So, while bug trackers are a useful tool for ensuring that problems do not fall through the
cracks, it seems that one of the most potent anti-bug tools we have - discussing the problem
via a widely-distributed email list - is the same tool we have been using for decades."
The bug tracker of Ruby is a mailing list (ruby-talk).
But when you use a ml like a bugtracker, with more than 100 bugs every day, it's clearly too
big for anyone to keep up the pace. And so is this mailing list. At the end, it was really
used like a bugtracker and discussion about development were on other mailing lists.
IMHO, the problem is not the mean, bugtracker, mailing list, irc or whatever, it's mainly a
lack of ressources. Everyone in the world can post a bug report. But a small tiny part of
"everyone" is able to read this report and say if it's really a bug or not. And a more smaller
tiny part of them know how to fix it.
Development issues part 2: Bug tracking
Yeah, a bug tracker is no use if the bugs don't get fixed.
I stopped using Mepis and won't go back because nobody cared about a bug that was causing me
grief.
I've submitted a bug or two to Debian and KDE and the lack of any response makes me unlikely
to bother again, except for a real show stopper.
Development issues part 2: Bug tracking
Development issues part 2: Bug tracking
a bug tracker is no use if the bugs don't get fixed.
Development issues part 2: Bug tracking
I have an even better example of bugs in the kernel bug database
languishing. Bug #2645 - msync() does not update the st_mtime and
st_ctime fields (http://bugzilla.kernel.org/show_bug.cgi?id=2645).
A relatively clear cut bug that is easily reproduced on any machine, and
can even cause dataloss (because mtime isn't updated, most incremental
backup systems will ignore the file, as it doesn't seem to have changed).
The bug was reported in 2004, and only just now something is starting to
happen in this regard, eventhough several patches for this problem have
appeared on linux-kernel.
Quite underwhelming.
Old chestnut:
"All bugs are shallow, given enough eyes." As you point out with the SCSI bug, it isn't
assigning a bug number, or identifying a problem, but getting more eyes interested in the
problem, that fixes things. F/OSS relies on people scratching itches, and bug repair relies
on making enough people itch.
Development issues part 2: Bug tracking
Development issues part 2: Bug tracking
It's reasonable to claim that the interdependence of data in the kernel is bigger than in
gnome.
(data = code,knowledge required etc)
You need to distribute to scale, for better scaling when distributing you need to minimize
data interdependence. In gnome there are many seperate projects with little or none dependence
on each other - at least compared to the kernel.
Buzzwords, guessing... not facts.. etc...:)
Also - the kernel has more code?
Development issues part 2: Bug tracking
I think one common problem with bug tracking systems is that they don't provide a good
workflow for reporters from the point of having a problem to the point of getting other people
involved. It is pretty much impossible for a user with a problem to identify whether the
problem has been reported already by using the query capabilities built into most bug
trackers, and, if a user doesn't find their bug already in the system, they have to re-enter
all of the information that they used to search with. Furthermore, the user has to do
something different to find out that something is a common user error. And the user has to do
yet another thing in the case where the problem has been reported by somebody else, but the
report doesn't cover all of the aspects of the new situation. So there's a high chance of user
error in this process, and poor handling of failure cases.
On the other side, one of the most useful things in resolving a bug is a second report which
clarifies or corrects the description of what's actually going on, and this is something that
bug tracking systems and triage actively filter out at pretty much all stages.
Development issues part 2: Bug tracking
Furthermore, the user has to do
something different to find out that something is a common user error.
Development issues part 2: Bug tracking
Well, another issue is how you deal with things that are fixed. Until everybody is using a
version of the software which has an appropriate error message, which is effectively never,
there will be people who get the old behavior and can't find out from the software that a
newer version is available and fixed their issue.
And, of course, fixing the user documentation doesn't help, because users never read
documentation until something goes wrong, at which point they're faced with the choice of
whether to go to the documentation or to the bug tracker, and they don't necessarily do the
right one. Ideally, there should be some common troubleshooting starting point where the
answer can be: read this section of the manual; or use a version newer than X; or wait for the
next release; or add your account of the issue to this bug; or make a new bug with this
information.
Development issues part 2: Bug tracking
On the other side, one of the most useful things in resolving a bug is a second report which
clarifies or corrects the description of what's actually going on, and this is something that
bug tracking systems and triage actively filter out at pretty much all stages.
Development issues part 2: Bug tracking
One approach to the bug tracker problem is to make the bug reporter "responsible" for their
bug report - i.e. for looking for other relevant information, including duplicates in the bug
tracker, duplicates in other bug trackers, possible workarounds and fixes and people who might
know how to help with it. Combined with a policy of automatically closing bugs which are
inactive for too long (with the option of the reporter re-opeining them). Granted this will
not solve all problems, but it may help a bit.
It can also help to give the people who report bugs (especially the non-coding types) the
feeling that they are making valuable contributions, which in turn can make them more
enthusiastic about the software :)
Bug tracking is a communication issue
Not a 'keeping track' issue.
We have this strange belief that if only the bug was written down
somewhere it has a better chance of being fixed. The only thing writing
it down does is, um, writing it down.
KDE has a huge bug database filled with a huge number of bugs. Some
valid, many duplicates, many wishes, a large number too old and possibly
fixed but who knows.
The best way to ruin anyone's desire to help and contribute to free
software (imho) is to assign them to bug triage. My joints hurt thinking
about it. There are rare people with the mix of personality traits that
love bug triage and are very good at it. A bronze bust must be made of
each one. No sane person would do it for nothing :) There are probably
fewer people with these gifts than there are software developers.
So we have a situation where it is easier to harness the manpower to
write a bug database than it is to maintain the content within.
Someone told me of an executive that would clear his desk on friday
evening into the garbage can. His theory was that if something was very
important it would show up again the next week. I think bugs fall into
that category. And we wouldn't have to maintain the fiction of people
caring about reported bugs.
Providing a way of communicating with the developer at a data rate that
they can handle would seem more productive.
Derek