|
|
Log in / Subscribe / Register

(statistically) biased tests?

(statistically) biased tests?

Posted Sep 12, 2002 17:49 UTC (Thu) by bockman (guest, #3650)
Parent article: Spam avoidance techniques

For the little I know of statistic filters, if you train a filter with a set of data, then you should not use the same set of data to evaluate how good the trained filter is ( since you are testing on the training data, the filter obviously shows good results ).

A better test was maybe to train the filter with half of the data set and then test it with the other half.


to post comments

(statistically) biased tests?

Posted Sep 12, 2002 18:26 UTC (Thu) by corbet (editor, #1) [Link] (1 responses)

"A better test was maybe to train the filter with half of the data set and then test it with the other half."

That was the first (15%) test, essentially. And the linux-kernel test too.

(statistically) biased tests?

Posted Sep 13, 2002 21:01 UTC (Fri) by ElMiguel (guest, #741) [Link]

But the numbers most people will remember from this article will be the ones with the 100% of lwn@lwn.net messages, since they are the ones showing the most striking advantage in favour of Bogofilter. And, as Bockman says, that is the least realistic test case of all, since you previously optimized the filter for precisely that set of messages. Perhaps you should make a note in the article itself to warn people who don't read the comments of that circumstance?

(Otherwise than that and overlooking spamc/spamd, great articles, as always :-)).


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds