A grumpy editor's bayesian followup
Posted Mar 2, 2006 14:59 UTC (Thu) by
zmi (subscriber, #4829)
Parent article:
A grumpy editor's bayesian followup
Hi, I got this reply on the spamassassin user mailing list
( users-subscribe@spamassassin.apache.org ):
Statisticaly speaking, 1% of BAYES_99 hits should be nonspam.In reality,
it does a lot better than that.
However, in the SA 3.1.0 set3 mass checks it still managed to match
about 21 messages in the nonspam test set:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
176869 123778 53091 0.700 0.00 0.00 (all messages)
60.712 86.7351 0.0396 1.000 0.90 3.50 BAYES_99
SA's scores aren't based on human assumptions about how the rules
behave. They are based on real-world testing and a perceptron
score-fitting system that accounts not only for the hit-rate of the
rule, but also for the combinations of rules that it tends to match
with. Often the reality is a lot more complex than you think.
(
Log in to post comments)