LWN.net Logo

Bug reports: information or spam?

April 11, 2012

This article was contributed by Nathan Willis

The Debian project recently debated the inherent trade-offs between making a bug reporting tool easy to use and turning it into a firehose that puts out more volume than the developers and maintainers can process. The impetus this time was false-positives caught by the Debian bug tracking system's (BTS) spam filter, but it is a question that the distribution — and indeed most distributions — grapple with regularly.

Why we foo

Michael Welle raised the issue on the Debian-devel mailing list, reporting that he had attempted to file a bug and was surprised when the BTS rejected his report because it contained a blacklisted URL. The surprise was that describing the bug in question required him to use a URL, and he had chosen what he thought was a general-purpose example: www.foo.org. But evidently the foo.org domain is on the blacklist of the uribl.com filtering service, which the BTS uses to strain out incoming spam. "Interesting user experience, bug reporters will like that big time..." Welle said.

Martin Krafft and Andrey Rahmatullin quickly replied that only RFC 2606-defined example URLs should be used in bug reports, to which Welle asked whether it was unreasonable to expect users to read an RFC before reporting a bug. Rahmatullin responded that users should "try not to use suspicious URLs."

At that point, Russ Allbery said that the root of the problem "is that foo.org is a real domain, and one that appears to be owned by one of those domain parking companies that quite likely could be doing lots of grey things with the domain. A lot of those companies are at the least spammers." In all likelihood, he added, foo.org really was used for spam at some point, although he conceded that it could be a false positive, due to others, like Welle, choosing it for its placeholder meaning (ironically, "foo" is documented as a placeholder in RFC 3092).

Getting practical

To that, Welle replied that he found the reliance on a real-time blacklist managed by a third party problematic. Fernando Lemos asked whether there was really any way to fix it. "We certainly can't disable spam filters or we'll be flooded with spam." Also, he added, because the BTS returns an error message explaining what blocked the bug from being accepted, in all likelihood real users will be able to correct the problem and resubmit. Anyone who cannot decipher the message and fix it is probably "not very tech-savvy," which decreases the odds that the report would be particularly useful. "I'm not saying it's good that we miss reports like this," he concluded, "but we must put things into perspective."

Interestingly, Ben Pfaff chimed in with the suggestion that the BTS could weed out spam by inspecting the report's metadata, looking for valid package names and versions. No one replied to that comment, though; instead the focus of the discussion turned to the acceptable threshold for rejecting otherwise valid bug reports.

Welle responded by saying that he was trying to raise the URL-filtering issue from bug reporters' perspective; people who "simply want to report a bug without being interested in external blacklist and stuff." He compared the bug report rejection with a company telling customers "we don't like you, go away." Russell Coker replied:

Actually companies do that all the time. Some corporate web sites used to reject browsers other than IE. [...] So comparing Debian to a commercial organisation doesn't support your case at all. Commercial organisations are more than willing to reject some customers if it makes things easy for them.

Welle replied that he tended to "to look for role models above me, not below me. Why imitate people or companies that do a bad job? We can do better. And of course, to come back to my initial email, I doubt that using the blacklist service makes anything easier for Debian." Debian Project Leader Stefano Zacchiroli concurred with that sentiment, saying that one of the project's role models is "people who report bugs and attach patches to them," and suggesting that Welle submit a report against the lists.debian.org pseudo-package in order to continue the discussion there.

Reporting culture

Welle agreed with Zacchiroli's assessment of the situation, although as of yet he does not appear to have filed a new bug on the subject. From outside the project, Welle's concern raises two distinct issues: how to deal with false-positives in the BTS's spam-filtering system, and the user-friendliness of the BTS as a whole (particularly where error messages are concerned).

It is hard to imagine that there are more than a handful of URL patterns that are both likely to be spontaneously chosen as examples by a bug reporter and be on uribl.com or a similar service's blacklist. RFC 2606 only offers three example URLs: example.com, example.net, and example.org. Perhaps exceptions could be made, or other techniques to filter out spam (such as Pfaff's suggestions) could be incorporated.

But, while it is undeniable that the BTS does require sturdy anti-spam measures, bug 635940 from July 2011 also questions the uribl.com blacklist. In that report, Blars Blarson responded that even with the URL filter, BTS still gets several hundred spam messages every day, and that before it, the daily count was in the tens of thousands, often totaling more than a gigabyte. The blacklist works by sending an SMTP 550 error code, which indicates that the requested mailbox does not exist; this explicit rejection is supposed to winnow out repeat offenders that simply dropping the messages would not.

Many distributions struggle with making their bug submission process easier to use, but Debian has extra challenges because it is in that small minority which (1) does not offer any sort of web-based bug submission form and (2) crucially, allows bug reports to be filed via email. The preferred method of reporting a bug is the reportbug command-line utility, which collects information from the user and the OS, and dispatches its report via email. Email reports can also be filed manually, if the correct formatting is used.

The other large distributions offer web bug-reporting tools, but increasingly the standard practice requires registering an account with email verification. Ubuntu does this through Launchpad, Fedora does the same in Red Hat's Bugzilla, and openSUSE uses the same technique with Novell's Bugzilla. Those systems may attract their fair share of spam (automated or otherwise), but the BTS bug-submission email address is well-publicized, which ensures that it is in the hands of every self-respecting spammer, and has been for years. Launchpad does have an email gateway, but it requires OpenPGP; Bugzilla supports email reporting gateways, but that feature does not appear to be in use by major projects.

The goal of the reportbug tool is generate more useful reports by gathering specifics. One downside, of course, is the exposure to email spam. But, given that the proper format for email bugs is unlikely to be present in run-of-the-mill spam, one would guess that a fairly simple filter might weed out the vast majority of spam sent to the bug-reporting address.

As Pedro Larroy pointed out in May 2011, however, the absolute dependence on SMTP can also be a problem, given that users may find themselves on a network that filters out SMTP (or on a private network with no access to the outside world). Larroy suggested that Debian add an HTTP transport mechanism for reportbug to fall back on; there was general agreement on the value of such a fallback, but many (including Ian Jackson) also argued that HTTP was a slippery slope, and that if such a gateway to the BTS was built, someone would (perhaps with the best of intentions) write a web-submission-form easily exploited by spammers and other attackers.

Of course, the uncomfortable truth is that Debian, like most large projects, knows that making bug reports harder to file reduces the number of reports, which reduces the time burden shouldered by developers, package maintainers, and bug triagers. Josselin Mouette said as much in the 2011 discussion, observing:

We already receive more bug reports than we can handle. We need less bug reports, but more useful ones. Ergo, putting an entry barrier to reporting bugs is not that silly.

Not everyone agreed; Patrick Strasser argued that "artificially throttling" reports is bad, and that user education needs to be integral to the reporting process. But Allbery countered that no one has the time to engage in user education, and consequently, making reporting less convenient "doesn't *fix* the problem, but it does weed out a lot of users who don't know how to file good bug reports (and some users who do, which is indeed a drawback)."

Ultimately, a collaborative project is going to include people with differing views on whether bug reports "serve" the users (by improving the software) or "serve" the developers (by providing information). MBA programs might classify this as a question over who is the customer and who is the supplier — the customer being the party whose needs are ultimately more important. Debian has veered into that debate before, as in the "What bugs reports are for" thread from March 2011, when Jesús Navarro cautiously suggested to Jackson that point four of the Debian Social Contract establishes users as the first priority. No one in the project seems to disagree about that principle — the thorny problem is that maintaining a balance between the ease of bug reporting and the demands placed on developers requires constant attention and adjustment.


(Log in to post comments)

Bug reports: information or spam?

Posted Apr 12, 2012 7:16 UTC (Thu) by dwmw2 (subscriber, #2063) [Link]

"The blacklist works by sending an SMTP 550 error code, which indicates that the requested mailbox does not exist"
Really? The message actually says that the mailbox does not exist, rather than giving the real reason the message was rejected? That would be particularly silly, if it's true.

SMTP rejection messages contain free-form text after the error code, which should always be visible to the user when they receive a bounce message. The spam-bots will only care that the message was rejected; real users should see the message, and it should be meaningful.

Bug reports: information or spam?

Posted Apr 13, 2012 12:26 UTC (Fri) by nix (subscriber, #2304) [Link]

It is meaningful, but the attached error code is 550.

Bug reports: information or spam?

Posted Apr 12, 2012 15:55 UTC (Thu) by joey (subscriber, #328) [Link]

> But, given that the proper format for email bugs is unlikely to be present
> in run-of-the-mill spam, one would guess that a fairly simple filter
> might weed out the vast majority of spam sent to the bug-reporting
> address.

There are two problems with this. First, $nnnn@bugs.debian.org accepts absolutely freeform email messages to followup to existing bug reports. So that's hundreds of thousands of email addresses spammers could still send spam to, even if submit@bugs.debian.org has smart filtering.

Secondly, all it takes is a few spammers who set up a targeted spam template with the appropriate headers for submit@, and suddenly we have to deal with not only large volumes of incoming spam, but spam that creates new bug reports that choke up lists of the real bug reports. So I think it's smart to not force the spammers to learn how to do that.

Bug reports: information or spam?

Posted Apr 13, 2012 15:25 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

all it takes is a few spammers who set up a targeted spam template with the appropriate headers for submit@

It's hard to imagine any spammer doing that. The spammer would be writing special code to target less than a hundred pairs of human eyes. That doesn't even fit the definition of spam.

Bug reports: information or spam?

Posted Apr 13, 2012 15:30 UTC (Fri) by corbet (editor, #1) [Link]

A lot of spammers don't care about human eyes; they target only the Google web crawler. They are more than willing to bash out a script to get their URLs into public places; trust me on this one.

RTFRFC?

Posted Apr 13, 2012 7:23 UTC (Fri) by Max.Hyre (subscriber, #1054) [Link]

It's obvious that expecting submitters to read RFC 2606 without being told they need to is a non-starter. So is telling them to read it. Would it be reasonable to add to the instructions something like


	All meta-domains should be based on
	``example''; e.g.:  www.example.com,
	ftp.example.fr, george.says.example.tv.

If it's not too much, the text might include ``(For details, see RFC 2606.)''.

Bug reports: information or spam?

Posted Apr 13, 2012 10:58 UTC (Fri) by wookey (subscriber, #5501) [Link]

The email-only reporting is a real pain for competent devs too. I find myself wanting to report bugs on machines with no working SMTP access all the time. (arm dev boards, laptops and desktop inside companies which don't allow SMTP, and just the fact that to send email requires configuring the local MUA with the CRAM-MD5 blackmagic for secure SMTP to my mailserver - I don't want to do that for every install/chroot/image that has problems). The idea that SMTP is _always_ available comes from a desktop/server-centric view of the world.

Yes there are all sorts of workarounds (save the reportbug file, remembering to fish it out of /tmp before a reboot loses it all), cut and paste into working email on some other box, run reportbug elsewhere (risk of wrong package metadata) etc. But none of it feels like a good use of developer time.

An http (or scp?) transport would be really useful.

Ubuntu's apport-cli which does an initial contact over http to initiate a report and lets you fill the rest in online is very useful in these situations, even if I much prefer a nice email report over clicking about in launchpad. I'd really appreciate an easy way of sending in bugreps from places where SMTP is not available/configured.

Bug reports: information or spam?

Posted Apr 13, 2012 12:51 UTC (Fri) by amonnet (subscriber, #54852) [Link]

Are you that sure http is the definitive way ?
Think usb-gadget, development board, ultra-paranoid firewall ...

Nothing could replace plain old mail.
http://pastebin.com/raw.php?i=4Wr2M3ag
Print. Cut. Fill. Post.

+++

Bug reports: information or spam?

Posted Apr 13, 2012 13:07 UTC (Fri) by wookey (subscriber, #5501) [Link]

Not definitive, but it's rare not to have even http available if there is any networking at all. In general when I haven't get SMTP I have got http, so having that option would definately be useful. Clearly there are cases when that won't work either.

Bug reports: information or spam?

Posted Apr 13, 2012 20:19 UTC (Fri) by josh (subscriber, #17465) [Link]

You definitely don't need to run a mail server to use reportbug. By default, reportbug configures itself to talk to reportbug.debian.org, which accepts mail for Debian bug reports (and nothing else). It also uses the submission port, which doesn't get blocked nearly as often as port 25.

Bug reports: information or spam?

Posted Apr 14, 2012 15:05 UTC (Sat) by zack (subscriber, #7062) [Link]

> It also uses the submission port, which doesn't get blocked nearly as often as port 25.

Right, but it is still more often blocked than http/https.

For that reason, I think that http submission for reportbug would be worthwhile, and that's why I've posted a while ago half a patch (the server-side half) for that at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=590269#51

What is missing to deploy that is the client-side half, i.e. support in reportbug to deliver MIME bug report via HTTP. If some kind soul is willing to do that, I'll be happy to deploy the server-side half somewhere for testing. See the above URL for more information.

Cheers.

Bug reports: information or spam?

Posted Apr 14, 2012 23:41 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

but there are also many times when e-mail will work and http/https is blocked.

There is no one protocol that will always work, it's a good idea for the reporting tool to support several different protocols.

Bug reports: information or spam?

Posted Apr 14, 2012 23:54 UTC (Sat) by josh (subscriber, #17465) [Link]

I don't think it makes sense to make every tool that needs to communicate over a network support several different protocols to transmit the same information. Just make it support one reliable protocol, and provide tools (such as SSH tunnels, VPNs, and Tor) for people on intentionally broken networks to get a usable connection.

Bug reports: information or spam?

Posted Apr 15, 2012 0:07 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

the networks where e-mail is blocked are not likely to allow SSH tunnels, VPNs, or Tor connections out either.

We also aren't talking about making all programs that communicate talk multiple protocols, we are talking about a specific use-case, submitting bug reports. By definition, when you are submitting a bug report, something is broken. As such, you should support multiple ways to submit the bug so that you can work around whatever is broken.

Also, the networks in question are only "broken" if you think that every computer in existence should be able to talk freely to every other computer in existence. This model of reality disappeared (if it ever really existed) decades ago. Security and Access restrictions are not only just the reality, they are very desirable in many cases.

Bug reports: information or spam?

Posted Apr 15, 2012 1:11 UTC (Sun) by josh (subscriber, #17465) [Link]

> the networks where e-mail is blocked are not likely to allow SSH tunnels, VPNs, or Tor connections out either.

What makes a network that only allows outbound HTTP different than a network that only allows some obscure protocol outbound, or a network that allows no outbound access at all? Should reportbug support DNS-based transmission to get through networks that block HTTP?

> Also, the networks in question are only "broken" if you think that every computer in existence should be able to talk freely to every other computer in existence. This model of reality disappeared (if it ever really existed) decades ago. Security and Access restrictions are not only just the reality, they are very desirable in many cases.

Disallowing inbound access makes sense for security. Disallowing outbound access (with the *possible* exception of port 25 on networks with a pile of infected spam-sending systems that can't just be kicked off the network) makes a network broken.

Bug reports: information or spam?

Posted Apr 15, 2012 1:41 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

>> the networks where e-mail is blocked are not likely to allow SSH tunnels, VPNs, or Tor connections out either.

> What makes a network that only allows outbound HTTP different than a network that only allows some obscure protocol outbound, or a network that allows no outbound access at all? Should reportbug support DNS-based transmission to get through networks that block HTTP?

no, you should not implement DNS-based transmission, ping based transmission, or other weird new protocols.

But for a bug reporting tool, you should support the common data communication protocols.

> Disallowing inbound access makes sense for security. Disallowing outbound access (with the *possible* exception of port 25 on networks with a pile of infected spam-sending systems that can't just be kicked off the network) makes a network broken.

Here I (and most security people) just disagree with you. It all depends on the purpose of the network, if the network is not intended to talk to the Internet, creating the ability for it to talk directly to the Internet is a bad idea.

Bug reports: information or spam?

Posted Apr 15, 2012 3:31 UTC (Sun) by josh (subscriber, #17465) [Link]

> But for a bug reporting tool, you should support the common data communication protocols.

SMTP is pretty common. :)

(Note, by the way, that I'm not attempting to argue against the implementation of HTTP for other reasons; I just think "because there are networks that block SMTP" doesn't seem like a good enough reason.)

> Here I (and most security people) just disagree with you. It all depends on the purpose of the network, if the network is not intended to talk to the Internet, creating the ability for it to talk directly to the Internet is a bad idea.

On the contrary, I agree that air-gapped networks potentially make sense. If you want a restricted network with *no* outbound access, by all means have one. And if your network should not provide access to the Internet, don't try to report bugs from that network. :)

But don't create a network that allows *some* traffic out without allowing *all* traffic out; any sufficiently creative and annoyed developer who just wants to get work done will find a way to turn whatever traffic you do allow through into a real Internet connection, as will anyone trying to get malicious activity through.

Bug reports: information or spam?

Posted Apr 15, 2012 3:54 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

security isn't the practice of preventing all possible activities (turning the computer off and sealing it in a faraday cadge will do that), it's a matter of managing risk and slowing down the attacker long enough to catch and stop them.

a network that can do some things, but not all things is a very reasonable, and very common situation.

Bug reports: information or spam?

Posted Apr 13, 2012 15:06 UTC (Fri) by sumanah (guest, #59891) [Link]

It's a difficult trick, only putting up the barriers that stop malicious persons, and balancing inclusivity for new bug reporters/users with convenience for the people who have to triage and respond to those bug reports. Thanks for the coverage.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds