April 11, 2012
This article was contributed by Nathan Willis
The Debian project recently debated the inherent trade-offs between making a bug reporting tool easy to use and turning it into a firehose that puts out more volume than the developers and maintainers can process. The impetus this time was false-positives caught by the Debian bug tracking system's (BTS) spam filter, but it is a question that the distribution — and indeed most distributions — grapple with regularly.
Why we foo
Michael Welle raised the issue on the Debian-devel mailing list, reporting
that he had attempted to file a bug and was surprised when the BTS rejected
his report because it contained a blacklisted URL. The surprise was that
describing the bug in question required him to use a URL, and he had chosen
what he thought was a general-purpose example: www.foo.org. But
evidently the foo.org domain is on the blacklist of the uribl.com filtering service, which the BTS uses to strain out incoming spam. "Interesting user experience, bug reporters will like that big time..." Welle said.
Martin Krafft and Andrey Rahmatullin quickly replied that only RFC 2606-defined example URLs should be used in bug reports, to which Welle asked whether it was unreasonable to expect users to read an RFC before reporting a bug. Rahmatullin responded that users should "try not to use suspicious URLs."
At that point, Russ Allbery said
that the root of the problem "is that foo.org is a real domain, and
one that appears to be owned by one of those domain parking companies that
quite likely could be doing lots of grey things with the domain. A lot of
those companies are at the least spammers." In all likelihood, he
added, foo.org really was used for spam at some point, although he
conceded that it could be a false positive, due to others, like Welle,
choosing it for its placeholder meaning (ironically, "foo" is documented
as a placeholder in RFC 3092).
Getting practical
To that, Welle replied that he found the reliance on a real-time blacklist managed by a third party problematic. Fernando Lemos asked whether there was really any way to fix it. "We certainly can't disable spam filters or we'll be flooded with spam." Also, he added, because the BTS returns an error message explaining what blocked the bug from being accepted, in all likelihood real users will be able to correct the problem and resubmit. Anyone who cannot decipher the message and fix it is probably "not very tech-savvy," which decreases the odds that the report would be particularly useful. "I'm not saying it's good that we miss reports like this," he concluded, "but we must put things into perspective."
Interestingly, Ben Pfaff chimed in with the suggestion that the BTS could weed out spam by inspecting the report's metadata, looking for valid package names and versions. No one replied to that comment, though; instead the focus of the discussion turned to the acceptable threshold for rejecting otherwise valid bug reports.
Welle responded
by saying that he was trying to raise the URL-filtering issue from bug
reporters' perspective; people who "simply want to report a bug
without being interested in external blacklist and stuff." He
compared the bug report rejection with a company telling customers
"we don't like you, go away." Russell Coker replied:
Actually companies do that all the time. Some corporate web sites used to
reject browsers other than IE. [...] So comparing Debian to a commercial
organisation doesn't support your case at all. Commercial organisations
are more than willing to reject some customers if it makes things easy for
them.
Welle replied that he tended to "to look for role models above me, not below me. Why imitate people or companies that do a bad job? We can do better. And of course, to come back to my initial email, I doubt that using the blacklist service makes anything easier for Debian." Debian Project Leader Stefano Zacchiroli concurred with that sentiment, saying that one of the project's role models is "people who report bugs and attach patches to them," and suggesting that Welle submit a report against the lists.debian.org pseudo-package in order to continue the discussion there.
Reporting culture
Welle agreed with Zacchiroli's assessment of the situation, although as of yet he does not appear to have filed a new bug on the subject. From outside the project, Welle's concern raises two distinct issues: how to deal with false-positives in the BTS's spam-filtering system, and the user-friendliness of the BTS as a whole (particularly where error messages are concerned).
It is hard to imagine that there are more than a handful of URL patterns
that are both likely to be spontaneously chosen as examples by a bug
reporter and be on uribl.com or a similar service's blacklist.
RFC 2606 only offers three example URLs: example.com, example.net, and
example.org. Perhaps exceptions could be made, or other techniques to
filter out spam (such as Pfaff's suggestions) could be incorporated.
But, while it is undeniable that the BTS does require sturdy anti-spam
measures, bug
635940 from July 2011 also questions the uribl.com blacklist. In that
report, Blars Blarson responded that even with the URL filter, BTS still gets several hundred spam messages every day, and that before it, the daily count was in the tens of thousands, often totaling more than a gigabyte. The blacklist works by sending an SMTP 550 error code, which indicates that the requested mailbox does not exist; this explicit rejection is supposed to winnow out repeat offenders that simply dropping the messages would not.
Many distributions struggle with making their bug submission process
easier to use, but Debian has extra challenges because it is in that small
minority which (1) does not offer any sort of web-based bug submission
form and (2) crucially, allows bug reports to be filed via email.
The preferred method of
reporting a bug is the reportbug command-line utility, which
collects information from the user and the OS, and dispatches its report
via email. Email reports can also be filed manually, if the correct
formatting is used.
The other large distributions offer web bug-reporting tools, but
increasingly the standard practice requires registering an account with
email verification. Ubuntu does this through
Launchpad, Fedora does the same
in Red Hat's Bugzilla, and openSUSE uses the same technique
with Novell's Bugzilla. Those systems may attract their fair share of
spam (automated or otherwise), but the BTS bug-submission email address is
well-publicized, which ensures that it is in the hands of every
self-respecting spammer, and has been for years. Launchpad does have an
email gateway, but it requires OpenPGP; Bugzilla supports email reporting
gateways, but that feature does not appear to be in use by major
projects.
The goal of the reportbug tool is generate more useful reports
by gathering specifics. One downside, of course, is the exposure to email
spam. But, given that the proper format for email bugs is unlikely to be
present in run-of-the-mill spam, one would guess that a fairly simple
filter might weed out the vast majority of spam sent to the bug-reporting
address.
As Pedro Larroy pointed
out in May 2011, however, the absolute dependence on SMTP can also be a
problem, given that users may find themselves on a network that filters out
SMTP (or on a private network with no access to the outside world). Larroy
suggested that Debian add an HTTP transport mechanism for
reportbug to fall back on; there was general agreement on the
value of such a fallback, but many (including
Ian Jackson) also argued that HTTP was a slippery slope, and that if such a
gateway to the BTS was built, someone would (perhaps with the best of intentions) write a web-submission-form easily exploited by spammers and other attackers.
Of course, the uncomfortable truth is that Debian, like most large
projects, knows that making bug reports harder to file reduces the number
of reports, which reduces the time burden shouldered by developers,
package maintainers, and bug triagers. Josselin Mouette said as much in
the 2011 discussion, observing:
We already receive more bug reports than we can handle. We need less bug
reports, but more useful ones. Ergo, putting an entry barrier to reporting
bugs is not that silly.
Not everyone agreed; Patrick Strasser argued that "artificially throttling" reports is bad, and that user education needs to be integral to the reporting process. But Allbery countered that no one has the time to engage in user education, and consequently, making reporting less convenient "doesn't *fix* the problem, but it does weed out a lot of users who don't know how to file good bug reports (and some users who do, which is indeed a drawback)."
Ultimately, a collaborative project is going to include people with differing views on whether bug reports "serve" the users (by improving the software) or "serve" the developers (by providing information). MBA programs might classify this as a question over who is the customer and who is the supplier — the customer being the party whose needs are ultimately more important. Debian has veered into that debate before, as in the "What bugs reports are for" thread from March 2011, when Jesús Navarro cautiously suggested to Jackson that point four of the Debian Social Contract establishes users as the first priority. No one in the project seems to disagree about that principle — the thorny problem is that maintaining a balance between the ease of bug reporting and the demands placed on developers requires constant attention and adjustment.
(
Log in to post comments)