SpamAssassin 3.4.1 released
The "auto whitelist" (AWL) feature of SpamAssassin has long been one of that program's more annoying aspects. In theory it tracks the emails from each sender to get an overall sense of whether they are trustworthy; email from a trusted source will get a bonus score, while messages from apparent spammers will be penalized. The sad truth of the matter, in your editor's experience, is that a spammer need only get a small number of messages through to convince the AWL that everything else should be whitelisted. If SpamAssassin's other scoring mechanisms were perfect, this kind of AWL corruption would not be a problem — but then the AWL would not be needed at all. In a world where scoring is imperfect, the AWL often seems to make things worse.
In 3.4.1, the SpamAssassin developers have tried to address some of the problems with the AWL by replacing it with a new mechanism called TxRep. The basic idea remains the same: track each sender's activity and adjust the score of new messages toward the mean of what has been seen in the past. But a number of useful changes have been made in how this tracking is done, starting with an expansion of the set of data that is used. TxRep maintains reputation scores for the sending email address (as did the AWL), but also the sending domain name, the IP addresses of the originating system and the server that transferred the message, and the "HELO" string used by the last server. For any given message, each of these quantities is mixed in with its own (user-configurable, naturally) weight.
Another useful change is that the sa-learn utility (until now used only with the Bayesian filter) can be used to train TxRep, so the same command now works to update both filters. There is a "dilution" mechanism that causes newer messages to have more influence on a sender's score than older ones, making the system more responsive should, say, a spammer repent and start actually sending useful stuff (or should TxRep initially misjudge a sender). TxRep can be used to whitelist (or blacklist) senders or IP addresses outright — something that might be worth doing automatically for the most obvious of spam or for messages that have been explicitly classified by the recipient. There is also a mechanism to automatically whitelist the recipients of outgoing mail — though that could have undesired effects if one is prone to sending irate responses to spammers.
With these changes, TxRep should be able to avoid some of the worst AWL pitfalls, though the documentation still recommends against turning on auto-learning until SpamAssassin as a whole has been tuned well. But the whole thing still seems to be built around the idea that people can be spammers part of the time and senders of legitimate email at others. Perhaps your editor is an excessively unforgiving character, but it seems like the sender of known spam should not get off lightly with a gradual tweaking of a reputation score; once a spammer, always a spammer. Trust is hard to earn but easy to lose; the TxRep mechanism still doesn't quite reflect that fact.
The PDFInfo module, which has long existed outside of the SpamAssassin mainline, has now been merged; PDFInfo, as its name would suggests, looks for spammy PDF attachments. There is one other new module, URELocalBL, which allows blacklisting of spammy links using a local database.
SpamAssassin 3.4.1 can do a more thorough and careful job of normalizing all messages to the UTF-8 character set before applying rules. That should help to eliminate various tricks using strange character sets to get around the spam-checking rules.
An interesting addition to the Bayesian filter is the ability to hash MIME
attachments and use the result as a filter token. If it works well, it
should allow the filter to recognize often-repeated spam payloads as a
whole. But, as the manual
page notes, "not much experience has yet been gathered regarding
its usefulness
". It seems worth a try, in any case.
Beyond all of this work, of course, is the constant challenge of maintaining the rule base in the face of a changing spammer landscape. Spammers may now be more concerned with getting past Gmail's filters than SpamAssassin, but there are still signs that a subset of spam has been tested against SpamAssassin until the rules are unable to stop it. The Bayesian filter helps with that problem, but so does an ongoing effort to keep those rules current. It is thus unsurprising that a new SpamAssassin release contains a long list of rule changes that should help to keep its effectiveness up — until the spammers work around those as well.
Your editor has often heard the complaint that email is reaching a point of
complete uselessness. Such claims overstate the reality — one need only
watch how email keeps our development communities going to see that. But
email has been under attack for many years, making life harder for both
email users and those who are charged with running email systems. It is
fair to say that SpamAssassin is one of a small set of tools that has
helped email to survive the ongoing spammer onslaught, so it is good to
see this tool continuing to evolve.
Index entries for this article | |
---|---|
Security | Email/Spam prevention |
Security | Spam |
Posted May 7, 2015 11:15 UTC (Thu)
by hickinbottoms (subscriber, #14798)
[Link] (1 responses)
"But the whole thing still seems to be built around the idea that people can be spammers part of the time and senders of legitimate email at others. Perhaps your editor is an excessively unforgiving character, but it seems like the sender of known spam should not get off lightly with a gradual tweaking of a reputation score; once a spammer, always a spammer. Trust is hard to earn but easy to lose; the TxRep mechanism still doesn't quite reflect that fact." I wonder whether this is designed around the case where malware that has harvested an address book then poses as that user to send mail to their known contacts. I've certainly seen this (fortunately rarely) amongst my contacts with the result that I occasionally get malware spam from friends. In such a case I wouldn't expect to blacklist them forever on a first-strike basis. Great news to hear about the update, though, and I second that SA is helping to keep email very much alive and well.
Posted May 7, 2015 17:17 UTC (Thu)
by smoogen (subscriber, #97)
[Link]
Posted May 7, 2015 12:48 UTC (Thu)
by hmh (subscriber, #3838)
[Link]
I've long been using clamav plus several long-term and fast-response spam/phishing/malware signature databases to score MIME attachments that match known signatures, and it is extremely useful, as well as very fast. This is not done directly in SpamAssassin, though. Amavisd-new is used, as I want to score based on the clamav results, as well as test each attachment against the signature database (and not just the whole email). It calls SA should the score (mapped using a table, keyed on regex matches on the "virus name" returned by clamav) not be enough to determine the message's fate.
Posted May 7, 2015 14:07 UTC (Thu)
by madscientist (subscriber, #16861)
[Link] (1 responses)
On the other hand, I've used bogofilter to absolutely amazing effect: after a day or two of training I get very little spam let through and even less ham caught. Really bogofilter is amazing.
Posted May 7, 2015 16:18 UTC (Thu)
by RobSeace (subscriber, #4435)
[Link]
Posted May 14, 2015 10:05 UTC (Thu)
by ssokolow (guest, #94568)
[Link]
...or you could just install an instance of SpamGourmet and hand out a different revokable alias to each person. I get maybe 2 or 3 spam a year by treating e-mail addresses as revokable API keys.
SpamAssassin 3.4.1 released
SpamAssassin 3.4.1 released
SpamAssassin 3.4.1 released
An interesting addition to the Bayesian filter is the ability to hash MIME attachments and use the result as a filter token. If it works well, it should allow the filter to recognize often-repeated spam payloads as a whole. But, as the manual page notes, "not much experience has yet been gathered regarding its usefulness." It seems worth a try, in any case.
SpamAssassin 3.4.1 released
SpamAssassin 3.4.1 released
SpamAssassin 3.4.1 released