LWN.net Logo

Advertisement

E-Commerce & credit card processing - the Open Source way!

Advertise here

Keeping spamassassin current

Keeping spamassassin current

Posted Mar 4, 2004 2:55 UTC (Thu) by bronson (subscriber, #4806)
Parent article: Keeping spamassassin current

It's true that SpamAssassin effectiveness declines over time after a release. That time can be a matter of hours. SA has such wide adoption that it is a primary target of spammers.

A possible solution? Include NO rules -- only code. This isn't quite as strange as it might sound. You would download SA Perl modules, then install whatever ruleset best fits your frame of mind. Spammers would have a much harder time trying to make silver bullet spams then.


(Log in to post comments)

Adaptive weighting?

Posted Mar 4, 2004 3:02 UTC (Thu) by Ross (subscriber, #4065) [Link]

Why can't they do some kind of Baysian network based not on single words
from the text but on the output from the various SpamAssassin rules? Then
they wouldn't have to fine-tune the weights and individual users would get
weights that matched the spam in their inboxes better. This would also
make it more difficult for spammers to test against the rule base since
they wouldn't know which rules are weighted heavily and which lightly.

Adaptive weighting?

Posted Mar 4, 2004 4:34 UTC (Thu) by proski (subscriber, #104) [Link]

Sounds like an excellent idea! I hope you will share it with Spamassassin developers. Complex rules is Spamassassin's strength. The way how they are combined (addition) is spamassassin's weakness. Predictiveness of the score is another weakness. Let's get rid of weaknesses.

Adaptive weighting?

Posted Mar 4, 2004 6:09 UTC (Thu) by mkettler (guest, #3933) [Link]

SpamAssassin has had a bayesian filter, in addition to the rules, for the past 8 releases. The first version with a bayes subsytem was 2.50, released Feburary of 2003.

No need to share this with the sa-devs.. they clued in a long time ago.

Adaptive weighting?

Posted Mar 4, 2004 20:23 UTC (Thu) by skybrian (subscriber, #365) [Link]

It sounds like you misunderstood the point. Unless something changed since 2.50, the bayesian filter is just a separate set of rules. The weight on each rule (including the Bayesian rules) is static.

Adaptive weighting?

Posted Mar 4, 2004 10:33 UTC (Thu) by nix (subscriber, #2304) [Link]

This has been tried, and wasn't terribly effective.

(Justin posted some test results on this to the sa-dev list maybe a year ago.)

Keeping spamassassin current

Posted Mar 4, 2004 10:37 UTC (Thu) by nix (subscriber, #2304) [Link]

This is `so bad it's not even wrong', like saying `include no functionality, only code'. Many of the rules consist of code; many of the rest depend on subtle details of a particular SA implementation or on features only present in recent SA versions: and the whole lot is scored by a GA, so forms an integrated whole. (In SA 3.0, they'll be scored by a perceptron instead, which may actually be fast enough that releasing SA more often will be practical. As it is the GA run adds over a week to release times.)

Most of the ancillary rulesets have arbitrarily-assigned scores, so might actually reduce the effectiveness of SA as a whole (this is likely if SA is already spotting most to all of your spam, in which case adding large numbers of non-GA-scored rules is likely to increase FPs.)

See the SA Wiki page on independently releasing rules.

Keeping spamassassin current

Posted Mar 4, 2004 18:10 UTC (Thu) by Cato (subscriber, #7643) [Link]

I find that SpamAssassin takes many weeks to decline in effectiveness after a new release - for a long time I was running a very old release and it was fine. If you are able to write your own rules, or just adjust the existing scores to your spam, you should be fine. The rules are still important even when the Bayesian classifier is in use, because messages scoring over (by default) 15 are used to 'autolearn' spam - without useful rules you would need to spend some time telling SA which messages are spam.

It is very rare that someone manages to get a spam past SA these days, at least with my custom rules.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds