LWN.net Logo

The MIT 2004 Spam Conference

The MIT 2004 Spam Conference

Posted Jan 22, 2004 15:29 UTC (Thu) by RobSeace (subscriber, #4435)
Parent article: The MIT 2004 Spam Conference

> how spammers could beat spam filters by using filters like POPFile to
> detect "good" words to get through a spam filter.

That really won't work very well... Because, the whole point with Bayesian
filtering (and, why it requires personal training) is that everyone's own
"good" words are going to be different... So, no spammer is going to be
able to magically determine what everyone's "good" words are... Sure, they
might manage to determine some that have a high probability of being in most
people's "good" list, but at the same time, they ALSO need to use some "bad"
words in order to pitch whatever the hell it is they're selling, too... So,
their messages are unlikely to appear as truly "good" messages, no matter
WHAT they do, short of silencing their own pitch entirely, and actually
forging a legit-looking message of some sort...

I, for one, haven't had much of any problem with the recent trend of spammers
inserting various random "good" words into their messages... Both SpamBayes
and BogoFilter seem to handle it just fine, and have no trouble seeing
through their trickery... It's only very rarely that a particularly
clever/lucky spam will make it through, these days... (The other day, I
got one loaded with tons of techy computer-related terms, which slipped
through with a score of 0.00! Impressive... But, one run through the
retraining, and it scored as 1.00... There were enough non-good words in
it to key off of for proper identification, I guess... I suspect it was
the gratuitous misspellings that were all over the place... Hell, one
could almost construct a reliable spam-filter using nothing more than
ispell/aspell, given spammers' seeming love of deliberately misspelling
everything... ;-))


(Log in to post comments)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds