Increasing BAYES_99 score can be very dangerous
Posted Mar 2, 2006 5:33 UTC (Thu) by
fyodor (guest, #3481)
Parent article:
A grumpy editor's bayesian followup
Jonathan does a great job with this and his other "grumpy editor" articles, but I cannot agree with his repeated suggestion to increase the BAYES_99 rule score for "allowing the bayesian filter to condemn mail on its own". The long description for that rule is "Bayesian spam probability is 99 to 100%". In other words, you may very well see 1% or more of your legitimate mail falling into that bucket.
Of about 1,850 non-spam mails I felt were important enough to keep in February, 36 of them (2%) were BAYES_99 false positives. And I'm probably missing some that were wrongly moved to my spam folder (which gets tens of thousands of messages per day, so I never check it). This includes many mails from non-hacker friends and neighbors, many legitimate sales-related queries, party invitations, Nmap questions, and all sorts of other things. I'm reasonably good about training it when I catch mistakes too. The more diverse your mail spool is, the tougher the filter's job will be. It may work great for on Jon's mail, but be sure to test it with your own before blithely following his advice and increasing the BAYES_99 score.
-Fyodor
(
Log in to post comments)