LWN.net Logo

Training and tweaking

Training and tweaking

Posted Mar 2, 2006 16:22 UTC (Thu) by corbet (editor, #1)
In reply to: A grumpy editor's bayesian followup by glouis
Parent article: A grumpy editor's bayesian followup

FWIW, the filters *were* "carefully trained." Just over 2000 messages were pulled out of the stream and used only for that purpose. They were well inspected to avoid mistraining the filter. How much more careful does one need to be?

I did avoid tweaking the various knobs exported by some of the filters, with the well-documented SpamAssassin exception. I believe that was the right choice: most users (even those who are not "newbies") are unlikely to mess with them, and the defaults should be reasonable.


(Log in to post comments)

Training and tweaking

Posted Mar 2, 2006 17:07 UTC (Thu) by glouis (guest, #526) [Link]

We could discuss this further here but I'd like to refer you to http://www.bgl.nu/bogofilter/tuning.html
which details a process by which bogofilter can be optimized.
The process is onerous and requires a considerably larger number of messages than the 2,000 each that you mention. It can also be well worth the effort for a prospective production user. I don't mean in any way to imply that you should have done all that, plus the equivalent for your other tested packages, by yourself; it would have taken you months. I just want to avoid leaving people with the impression that your table reflects the performance of which bogofilter is capable.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds