Bayesian filtering isn't a panacea
Posted Oct 16, 2003 3:53 UTC (Thu) by
rfunk (subscriber, #4054)
Parent article:
Open spam filtering rules considered harmful?
You say Bayesian filters cannot be worked around in any easy way, but in
my experience that's not entirely true.
I don't use SpamAssassin (yet), but I do use bogofilter, which is a
purely bayesian approach. I get around 100-130 spams a day, and these
days about 2-8 per day actually make it to my inbox.
Those 2-8 messages that bogofilter doesn't catch tend to use techniques
such as putting their payload in the html part of a multipart/alternative
message, and some innocuous book excerpt in the text part. Or they stick
lots of random words (or just random letters) in the message. Or they
use creative misspelling to avoid the words that would definitively flag
the message as spam. Some are just short HTML messages that try to load
their content from elsewhere. And a few actually read like a handcrafted
message, with little indication of their spammy nature.
I probably need to tune my bogofilter setup, and I may move to
spamassassin in order to avoid relying entirely on bayesian filtering.
Bayesian filtering is a wondering invention and certainly the single most
effectively spam content filter, but in a world where spammers creatively
adjust to the filters, it can't do the whole job on its own.
(
Log in to post comments)