LWN.net Logo

Mail filtering in Thunderbird 1.5

Your editor recently had a chance to try out the second beta Thunderbird 1.5 release. There are a number of nice additions in this release of Mozilla's mail client - and a few not-so-nice subtractions, in the form of broken extensions. This article will concentrate on a couple of security-related features.

Thunderbird has had spam filtering for some time. Your editor has never given it a full test, however. Happily, an ideal resource exists for this purpose: your editor's 4000-spam-per-day mail stream. A quick config file tweak directed a copy of this stream, unfiltered, into Thunderbird to see how it would react.

The bayesian filter built into Thunderbird turns out to be a quick learner. After 100 messages or so, it was busily marking most messages itself. The speed with which it learns tempts the user to turn on automatic spam-canning of marked mail early in the process; it is such a delight to see that stuff simply disappear. Training a SpamAssassin filter takes quite a bit longer.

Unfortunately, the Thunderbird filter appears to learn too quickly, with the result that false positives become a problem. As long as Thunderbird is not configured to automatically refile spam, the false positives can be corrected with, one assumes, an appropriate tweaking of the filter. Once spams have been diverted, however, there appears to be no way to tell Thunderbird that it made a mistake. So new Thunderbird users would be well advised to look over its spam classification decisions for some time before empowering it to refile mail automatically.

SpamAssassin's more conservative approach may well turn out to be better for people who cannot afford to lose mail. Happily, Thunderbird 1.5 includes an option which causes it to defer to SpamAssassin on filtering. Thus, the system administrator can use SpamAssassin to add headers to mail, and individual users can have Thunderbird act on those headers if desired.

A truly new feature in 1.5 is phishing detection. A few simple rules have been added to detect phishy links; essentially, a message will be flagged if a URL contains a numeric IP address or the link text contains an address which fails to match the link destination. In these cases, clicking on a suspect link will result in a dialog explaining the situation and asking if the user wishes to proceed. Thunderbird will also mark such messages with a line saying "Mail/News thinks this message might be an email scam."

This capability is a step in the right direction, but it has some obvious shortcomings. It failed to detect a number of random phishes found in your editor's mailbox. The "this might be junk" message also overrides the phishing warning; arguably the scam warning should take priority. The real risk, though, is that users might think that, if Thunderbird does not flag a message, it must be legitimate. Remember, these are people who fall for phishing scams in the first place.

The best way to avoid that possibility would be to improve the detection of phishing messages. One wonders if the bayesian filter could be trained to this purpose as well as detecting spam. There is also ample opportunity for cooperation with anti-phishing groups which maintain lists of known phishing sites - though one would have to be careful to preserve a user's privacy when checking links.

Quibbles aside, Thunderbird 1.5 is a step in the right direction toward a more secure email environment. More work clearly remains to be done - but that is likely to always be the case. Meanwhile, tools which help to reduce the spam and phishing problems can only be a good thing.


(Log in to post comments)

Mail filtering in Thunderbird 1.5

Posted Oct 13, 2005 3:39 UTC (Thu) by djfoobarmatt (guest, #6446) [Link]

If the spam filter in Thunderbird 1 makes a mistake, you can click on a 'not junk' button when you select the false positive (which might have been moved to the junk folder) or click on the trashcan next to the item in the listing and then drag it back to the inbox. It would be nice if the 'not junk' button also moved the email back to the inbox. Sounds like 1.5 hasn't got that. I agree that it learns faster than SpamAssasin which seems to be because SpamAssasin uses the Bayesian filter as just one of it's criteria for rejecting spam (as well as black lists and all sorts of other configurations) where as Thunderbird seems to just use the Bayesian filter alone.

Also, Thunderbird 1 can whitelist an addressbook which reduces false positives too. I assume 1.5 still has this.

Mail filtering in Thunderbird 1.5

Posted Oct 13, 2005 9:36 UTC (Thu) by thomask (guest, #17985) [Link]

I have all my spam re-directed to the junk folder, but when I have time I tend to check through for false positives: these do happen occasionally. Of course, that might be to do with the fact that my ISP is quite aggressive about flagging up potential spam by altering the subject line.

Mail filtering with SpamAssassin

Posted Oct 13, 2005 13:14 UTC (Thu) by shane (subscriber, #3335) [Link]

I find that using SpamAssassin + procmail is good, because spam determination is not "yes/no". I refile things that are definately spam (lots of SpamAssassin points) into a "junk" folder, and things that are probably spam (a few SpamAssassin points) into a "junk-check" folder. I look at the contents of "junk-check" every week or so.

In the past 2 years or so, I have had something like 2 false positives in the "junk-check", and none in the "junk".

This system saved a lot of time I was wasting looking at spam.

Phishing in Thunderbird 1.5

Posted Oct 13, 2005 15:29 UTC (Thu) by smoogen (subscriber, #97) [Link]

I think that the phishing algorithm should be improved.. but it will not stop people from being lulled into getting hooked. The better the algorithm the more likely they will think that something bad is legit.. and the only way to fix this is user training.

Phishing in Thunderbird 1.5

Posted Oct 13, 2005 17:20 UTC (Thu) by gswoods (subscriber, #37) [Link]

I agree on the user training part; there is unfortunately no substitute.

I just wanted to add that ClamAV, which is actually anti-virus software, finds a lot of phishing mail. I presume it is using a virus-scanner-like approach of just looking for certain byte signatures, but on my system at work (with about 1200 users), it catches thousands of these per day with no false positives. In my personal mail stream, more messages classified as "viruses or other malware" are phishing scams than actual viruses.

Since ClamAV is run before SpamAssassin, I cannot say how many of these would have also been flagged by SpamAssassin.

Mail filtering in Thunderbird 1.5

Posted Oct 14, 2005 0:23 UTC (Fri) by arcticwolf (guest, #8341) [Link]

I can say from personal experience that bayesian filtering works just fine for phishing messages, too. I used to use bogofilter+procmail before switching to Gmail myself, and trained it to treat phishing messages as spam - and it worked perfectly.

Bayesian phish filtering

Posted Oct 14, 2005 0:31 UTC (Fri) by corbet (editor, #1) [Link]

Interesting. The SA bayesian filter is frighteningly effective - for me, 4000 spams/day filters down to something like 20 that actually get through, with almost zero false positives. But a large portion of what does get through is phishing spam. Somehow, they look different than my spam, at least, and the filter never does catch them all. That's why I've started wondering if it might be worthwhile to train a filter separately for phish.

Mail filtering in Thunderbird 1.5

Posted Oct 14, 2005 16:06 UTC (Fri) by zblaxell (subscriber, #26385) [Link]

Bayesian filters learn to reject some class of input and accept some other class of input, so in theory they can separate "spam+phishing" and "ham" automatically. Unfortunately, Bayesian filters assume that the input classes use and reuse distinct sets of tokens.

When the input contains only "ham" tokens and "new" tokens--which would seem to describe a phishing scam fairly well--the Bayesian algorithm can't ever return a "spam" result. A competent phisher will send you an email that you would legitimately receive anyway (e.g. a notice from eBay, using words that eBay uses in their routine customer email communication) except for a few new hapaxes (words that appear exactly once for the first time in that email message), such as the URL for the phisher's site. Bayesian algorithms cannot identify such messages as spam since the mail contains only known non-spam tokens and unknown tokens, which result in either a "probably ham" response or a "unknown" response.

In some cases the Bayesian filter works on phishing scams anyway. For example, I don't have an eBay or PayPal account, and I get virtually zero mail from anyone who does, so at the moment my Bayesian filter thinks that you're a spammer or phisher if you merely mention those names. It's hard to tell exactly what a Bayesian filter (or any learning filter for that matter) has learned without taking it apart and looking at its data tables.

Some non-Bayesian approaches might work better on phishing. Actually it would be really nice to have a Grumpy Editor's guide to spam filters. There are a few out there--crm114 is a treasure trove of esoteric algorithms (and can be used for things like syslog analysis too), there is at least one project using Markov 2-word chains (good for those spams that quote lots of random text to look more like legitimate email), and all of the Bayesian classifiers have slightly different implementation details.

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds