TREC Spam Tests
Posted Apr 29, 2006 20:59 UTC (Sat) by gvc
Parent article: A grumpy editor's bayesian followup
Could I convince you to prepare your corpus so as to be used with the TREC spam evaluation toolkit? TREC ran a Spam Track in 2005 and will run another in 2006. It uses a standard interface to test filters so that the data can be kept private.
The toolkit web page has setups for eight open source filters, including most of the ones you tested. If you were to implement the toolkit you could run those filters and post comparative results, and also could test new filters.
I invite you and others to participate in TREC 2006. A letter of intent is needed right away (the official deadline has passed, but that's OK). "Normal" participation involves submitting a filter, and also running the same filter on a public corpus that will be released. The submitted filter is run on blind datasets. Therefore there are "special" participants who actually test the filters on their private data and send in the results.
to post comments)