The best of both worlds - a hybrid approach?
Posted Feb 23, 2006 1:57 UTC (Thu) by wstearns
(✭ supporter ✭
Parent article: The Grumpy Editor's guide to bayesian spam filters
Good comparison, Jon - thanks.
If you'll allow me to paraphrase, SpamAssassin (bias alert - I
contribute to SpamAssassin) is accurate but slow. Other tools are faster,
but not quite as accurate. How about getting the speed of other tools and
the accuracy of SpamAssassin?
Picture procmail first handing the message off to a fast filter
such as bogofilter, CRM114, or DSPAM. Those are told to only score if
they're certain a message is ham or spam, which would probably mean
adjusting the thresholds for ham and spam and leaving a larger gray area
between those thresholds. When the above bayesian filter is not sure,
procmail then hands the message off to SpamAssassin (with bayes filtering
turned off but all the network checks active) for more in depth checks.
For the vast majority of the messages you get quick filtering.
When that bayes score is borderline, we check again with a slower but more
accurate tool. Wouldn't that be the best of both worlds - accurate and
almost as fast as the initial filter by itself?
to post comments)