Sponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
Another paperAnother paperPosted Sep 12, 2002 12:21 UTC (Thu) by armijn (guest, #3653)Parent article: Spam avoidance techniques
At the SANE 2002 conference in the Netherlands a paper was presented http://www.nluug.nl/events/sane2002/papers.html Instead of Bayesian learning it uses the k-nearest neighbours algorithm.
(Log in to post comments)
Another paper Posted Sep 12, 2002 13:17 UTC (Thu) by jrennie (guest, #3655) [Link] FYI, k-nearest neighbors (kNN) is very slow compared to filtering by rules or Bayesian approaches (like Graham describes, bogofilter and ifile). For each message you want filtered, kNN compares that message to all messages in the training database. So, filtering n messages is O(nm) (where m is # of training messages). Bayesian approaches scale as O(n).
Jason Rennie
machine learning techniques Posted Sep 13, 2002 13:05 UTC (Fri) by robertb (guest, #3673) [Link] There are lots of machine learning techniques which do have reasonable test run-times (the training run-times can be quite high for some, 'though). I'm sure we'll hear about spam filters based on other techniques over the coming years, and hopefully not based on only words or even combinations of words. (Razor may be going in this direction with its fuzzy matching techniques (sucking up swaths of text rather than individual words).)On a different subject, it's surprising that there's been no mention of "white list keywords". I think this can be an effective technique, particularly on the individual level. (Eventually, using Bayesian techniques such as ifile, these may be able to be generated automatically and then pruned by the individual as necessary.)
<begin plug>
|
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.