LWN.net Logo

TREC Spam Tests

TREC Spam Tests

Posted Apr 29, 2006 20:59 UTC (Sat) by gvc (guest, #37441)
Parent article: A grumpy editor's bayesian followup

Could I convince you to prepare your corpus so as to be used with the TREC spam evaluation toolkit? TREC ran a Spam Track in 2005 and will run another in 2006. It uses a standard interface to test filters so that the data can be kept private.

The toolkit web page has setups for eight open source filters, including most of the ones you tested. If you were to implement the toolkit you could run those filters and post comparative results, and also could test new filters.

I invite you and others to participate in TREC 2006. A letter of intent is needed right away (the official deadline has passed, but that's OK). "Normal" participation involves submitting a filter, and also running the same filter on a public corpus that will be released. The submitted filter is run on blind datasets. Therefore there are "special" participants who actually test the filters on their private data and send in the results.


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds