|
|
Log in / Subscribe / Register

Another paper

Another paper

Posted Sep 12, 2002 13:17 UTC (Thu) by jrennie (guest, #3655)
In reply to: Another paper by armijn
Parent article: Spam avoidance techniques

FYI, k-nearest neighbors (kNN) is very slow compared to filtering by rules or Bayesian approaches (like Graham describes, bogofilter and ifile). For each message you want filtered, kNN compares that message to all messages in the training database. So, filtering n messages is O(nm) (where m is # of training messages). Bayesian approaches scale as O(n).

Jason Rennie
Author of ifile - the original intelligent e-mail filter


to post comments

machine learning techniques

Posted Sep 13, 2002 13:05 UTC (Fri) by robertb (guest, #3673) [Link]

There are lots of machine learning techniques which do have reasonable test run-times (the training run-times can be quite high for some, 'though). I'm sure we'll hear about spam filters based on other techniques over the coming years, and hopefully not based on only words or even combinations of words. (Razor may be going in this direction with its fuzzy matching techniques (sucking up swaths of text rather than individual words).)

On a different subject, it's surprising that there's been no mention of "white list keywords". I think this can be an effective technique, particularly on the individual level. (Eventually, using Bayesian techniques such as ifile, these may be able to be generated automatically and then pruned by the individual as necessary.)

<begin plug>
See this page for lots of spam fighting techniques/ideas.
<end plug>


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds