LWN.net Logo

Spam filtering techniques (developerWorks)

developerWorks looks at six different ways to deal with spam. "At first blush, it would be reasonable to suppose that a set of hand-tuned and laboriously developed rules like those in SpamAssassin would predict spam more accurately than a scattershot automated approach. It turns out that this supposition is dead wrong. A statistical model basically just works better than a rule-based approach"
(Log in to post comments)

Depends on the mail feed...

Posted Jan 29, 2003 16:00 UTC (Wed) by emk (guest, #1128) [Link]

I get a lot of spam, and I've done some very modest research into statistical spam filtering. But so far, none of the proposed methods I've tried has reached the quality of Spam Assassin on my personal mail feed.

Spam filtering techniques (developerWorks)

Posted Jan 29, 2003 16:41 UTC (Wed) by mkettler (guest, #3933) [Link]

Agreed, bayesian and similar statistical methods are HIGHLY effective, none of this should be news to anyone following the whitepaper on the subject by Paul Graham (which is not strictly a bayesian approach, but is a statistical learning approach) and several follow-on whitepapers discussing tweaks involving true bayesian methods and alternate approaches.

The indisputable power of bayes methods is why the current development effort on SpamAssassin is working to add a trainable bayesian-esque statistical filter. This should be released in the next version in the not-too-distant future.

It would not make sense for SA, or any other major spam filter project, to ignore the power of a bayes filter, although SA still will have a tuned ruleset to compliment it.

Spambayes works well

Posted Jan 29, 2003 17:56 UTC (Wed) by nas (subscriber, #17) [Link]

I'm having very good results with Spambayes. I'm also experimenting with another idea. The idea is to temporarily reject messages that look like spam. Ham will be retried. Almost all spam will not be retried. This scheme has the added benifit of making it easier to track down the spam source before it gets into people's mail boxes. If we want to win the war on spam we need to find a way to keep the 15 or so idiots who respond to it from getting it.

Tempfailing the first time

Posted Jan 29, 2003 20:47 UTC (Wed) by dskoll (subscriber, #1630) [Link]

Actually, I discovered this technique (see my posting in comp.mail.sendmail.

It is part of my commercial CanIt anti-spam system; you can see real-time statistics in the Hit-And-Run Report (login demo/demo). -- David F. Skoll

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds