<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
>

  <channel rdf:about="http://lwn.net/headlines/9460/">
    <title>LWN: Comments on "Spam avoidance techniques"</title>
    <link>http://lwn.net/Articles/9460/</link>
    <description>
This is a special feed containing comments posted
to the individual LWN article titled &quot;Spam avoidance techniques&quot;.

    </description>

    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>2</syn:updateFrequency>
    <items>
      <rdf:Seq>
	<rdf:li resource="http://lwn.net/Articles/341425/rss" />
	<rdf:li resource="http://lwn.net/Articles/195523/rss" />
	<rdf:li resource="http://lwn.net/Articles/132789/rss" />
	<rdf:li resource="http://lwn.net/Articles/122492/rss" />
	<rdf:li resource="http://lwn.net/Articles/104829/rss" />
	<rdf:li resource="http://lwn.net/Articles/83232/rss" />
	<rdf:li resource="http://lwn.net/Articles/28782/rss" />
	<rdf:li resource="http://lwn.net/Articles/16976/rss" />
	<rdf:li resource="http://lwn.net/Articles/14475/rss" />
	<rdf:li resource="http://lwn.net/Articles/10329/rss" />
	<rdf:li resource="http://lwn.net/Articles/10126/rss" />
	<rdf:li resource="http://lwn.net/Articles/9833/rss" />
	<rdf:li resource="http://lwn.net/Articles/9815/rss" />
	<rdf:li resource="http://lwn.net/Articles/9784/rss" />
	<rdf:li resource="http://lwn.net/Articles/9759/rss" />
	<rdf:li resource="http://lwn.net/Articles/9738/rss" />
	<rdf:li resource="http://lwn.net/Articles/9721/rss" />
	<rdf:li resource="http://lwn.net/Articles/9720/rss" />
	<rdf:li resource="http://lwn.net/Articles/9718/rss" />
	<rdf:li resource="http://lwn.net/Articles/9713/rss" />
	<rdf:li resource="http://lwn.net/Articles/9696/rss" />
	<rdf:li resource="http://lwn.net/Articles/9692/rss" />
	<rdf:li resource="http://lwn.net/Articles/9684/rss" />
	<rdf:li resource="http://lwn.net/Articles/9676/rss" />
	<rdf:li resource="http://lwn.net/Articles/9648/rss" />
	<rdf:li resource="http://lwn.net/Articles/9646/rss" />
	<rdf:li resource="http://lwn.net/Articles/9644/rss" />
	<rdf:li resource="http://lwn.net/Articles/9641/rss" />
      
      </rdf:Seq>
    </items>

  </channel>
    <item rdf:about="http://lwn.net/Articles/341425/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/341425/rss</link>
      <dc:date>2009-07-15T19:46:38+00:00</dc:date>
      <dc:creator>patrickcurrier</dc:creator>
      <description>
      Wow time flies.  Here's a more updated list of spam filters http://www.theyellowlists.com/TYL/Spam_Filters.html   
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/195523/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/195523/rss</link>
      <dc:date>2006-08-15T12:36:44+00:00</dc:date>
      <dc:creator>andycreats</dc:creator>
      <description>
      I can recommend a good &lt;a rel=&quot;nofollow&quot; href=&quot;http://www.spam-reader.com&quot;&gt;outlook spam filter&lt;/a&gt; - Spam Reader. I used it about 6 months and very satisfied. I tried before also Spam Bully and it worked fine but there was one thing I don't like - it's too much complicated in its options. I needed very simple but effective spam filter - Spam Reader is the best in this category. If you need some fully customizable spam filter then of course you need to use Spam Bully, but be ready guys to spent about 3-4 hours to read its help file :)
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/132789/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/132789/rss</link>
      <dc:date>2005-04-19T23:44:04+00:00</dc:date>
      <dc:creator>patrickcurrier</dc:creator>
      <description>
      Sorry I meant to add the actual link --- Spam Filter - &lt;A href=&quot;http://www.spamfilternews.com/&quot;&gt;http://www.spamfilternews.com&lt;/A&gt;&amp;nbsp;&lt;/P&gt;		
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/122492/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/122492/rss</link>
      <dc:date>2005-02-08T02:06:07+00:00</dc:date>
      <dc:creator>patrickcurrier</dc:creator>
      <description>
      by the way there is great info about spam filter stuff at http://www.spamfilternews.com  
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/104829/rss">
      <title>Bayesian Spam avoidance possible hole ?</title>
      <link>http://lwn.net/Articles/104829/rss</link>
      <dc:date>2004-10-02T12:58:57+00:00</dc:date>
      <dc:creator>jerry</dc:creator>
      <description>
      Theoretically, it will not defeat the bayesian filter, but in reality, it does affect the filter, especially considering the impact on speed and memory usage. personally, I think designing an algorithm that &quot;forget&quot; rarely used tokens may be tedious or costly/impractical...&lt;br&gt;&lt;br&gt;

The way we used in a real world implementation &lt;a href=&quot;http://www.spamweed.com&quot;&gt;spamweed&lt;/a&gt; is to mix bayesian filter with other technologies, especially those that can extract useful information among spammer's decoys, which significanly increase bayesian filter efficiency and stabilily.&lt;br&gt;&lt;br&gt;

Jerry: Engineer &lt;a href=&quot;http://www.spamweed.com&quot;&gt;SpamWeed.com&lt;/a&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/83232/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/83232/rss</link>
      <dc:date>2004-05-04T09:47:57+00:00</dc:date>
      <dc:creator>RobbyRDG</dc:creator>
      <description>
      The &lt;a href=&quot;http://www.spambully.com&quot;&gt;spam filter&lt;/a&gt; that I use - Spam Bully, has multiple filtering techniques like friends/spammers list; block email by country/language; block certain words or phrases; RBL integrationalso it allows you to see detailed information about each email you receive- IP address, country, character set, and how SpamBully ranked it. Tells you why a message was or was not blocked and how to correct this in the future.
But it always blows me away how incredibly accurate a pure Bayesian approach is. All those other methods add complexity to the process and no measurable improvement in accuracy except when you first stat training your filter. 

also it allows you to see detailed information about each email you receive- IP address, country, character set, and how SpamBully ranked it. Tells you why a message was or was not blocked and how to correct this in the future.

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/28782/rss">
      <title>Bayesian Spam avoidance possible hole ?</title>
      <link>http://lwn.net/Articles/28782/rss</link>
      <dc:date>2003-04-15T04:27:49+00:00</dc:date>
      <dc:creator>mattknox</dc:creator>
      <description>
      Actually, this would not work all that well, unless the spammers chose a document that is in your field.  If you get a lot of mail about scripting languages or kernel development, then an article about one of these topics might help spam get through(at least a few times).  However, if a random article that you would not normally recieve in the mail was attached, it would do nothing, because the terms would neither look like spam nor like ham.  So the only way for spammers to win on this strategy is to find words that are found in mail that goes to a lot of people, and has not appeared in spam yet.  This will be, at best, an uphill battle for them.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/16976/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/16976/rss</link>
      <dc:date>2002-12-02T23:17:57+00:00</dc:date>
      <dc:creator>Waldo</dc:creator>
      <description>
      Hi,&lt;br&gt;filter training must be done with two sets of mails, archives of spam and good mail. Someone has to classify all mail to one of these. This might be done automatically with SpamAssassin or by each user. If the sysadmin does this job he/she has to read someone´s mail and this is against the law, at least in my country. The next problem is the archive itself. This is stored private data from different individuals, that is the next criminal act. It is not allowed to store private data, not the spam mail, it is the loveletter for your collegue.&lt;br&gt;So the user has to classify and this does not guaratee a better result.&lt;p&gt;Greetings from Europe
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/14475/rss">
      <title>How does online shopping work</title>
      <link>http://lwn.net/Articles/14475/rss</link>
      <dc:date>2002-11-04T01:31:47+00:00</dc:date>
      <dc:creator>mcisaac</dc:creator>
      <description>
      Great article!&lt;p&gt;I'm worried that routine activities such as online shopping might be difficult with this approach.  In the &amp;quot;definition of spam&amp;quot; section, the paper touches on what is and is not spam, referencing a merchant receipt as an example of something commercial that isn't spam.&lt;p&gt;My question is, does the receipt pass the Bayesian filter or get flagged as spam?
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/10329/rss">
      <title>Sued ?!?</title>
      <link>http://lwn.net/Articles/10329/rss</link>
      <dc:date>2002-09-19T11:08:16+00:00</dc:date>
      <dc:creator>job</dc:creator>
      <description>
      Oh, I have no idea you lived in the US. It was not an anti-American comment at all, just pro-free society, no matter where on Earth it may be. Don't take it personally.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/10126/rss">
      <title>Bayesian Spam avoidance possible hole ?</title>
      <link>http://lwn.net/Articles/10126/rss</link>
      <dc:date>2002-09-18T06:29:18+00:00</dc:date>
      <dc:creator>guybar</dc:creator>
      <description>
      &lt;br&gt;It seems a bit stupid to ask, but what's stoping the spammers from attaching a random scientific/financial/other serious/ article after the actual spam ?&lt;p&gt;wouldn't this defeat the bayesian techniques described ?&lt;br&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9833/rss">
      <title>(statistically) biased tests?</title>
      <link>http://lwn.net/Articles/9833/rss</link>
      <dc:date>2002-09-13T21:01:18+00:00</dc:date>
      <dc:creator>ElMiguel</dc:creator>
      <description>
      &lt;p&gt;But the numbers most people will remember from this article will be the ones with the 100% of lwn@lwn.net messages, since they are the ones showing the most striking advantage in favour of Bogofilter.  And, as Bockman says, that is the least realistic test case of all, since you previously optimized the filter for precisely that set of messages.  Perhaps you should make a note in the article itself to warn people who don't read the comments of that circumstance?

&lt;p&gt;(Otherwise than that and overlooking spamc/spamd, great articles, as always :-)).
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9815/rss">
      <title>Sued ?!?</title>
      <link>http://lwn.net/Articles/9815/rss</link>
      <dc:date>2002-09-13T16:36:54+00:00</dc:date>
      <dc:creator>gswoods</dc:creator>
      <description>
      Yeah, right. And just where is this 'free' country?&lt;p&gt;US law has lots of problems, but so does anyplace else.&lt;p&gt;Besides, this is a gratuitous anti-American comment. There's no need for&lt;br&gt;that here. We all have valid reasons for living where we do. I think it&lt;br&gt;would be stupid to move to another country just so that I could be free&lt;br&gt;to filter content for an entire organization. And it's just as easy to&lt;br&gt;argue that content filtering restricts the freedom of employees to use&lt;br&gt;the Internet. I'm not saying I agree with that argument, but you have to&lt;br&gt;be very careful when glibly tossing around the word 'free'.&lt;p&gt;Lastly, content filtering is not illegal even here. It's just that if you&lt;br&gt;filter, then you're responsible for what gets through your filters. &lt;br&gt;Personally, I would agree that's silly, but that's what the courts have ruled.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9784/rss">
      <title>machine learning techniques</title>
      <link>http://lwn.net/Articles/9784/rss</link>
      <dc:date>2002-09-13T13:05:31+00:00</dc:date>
      <dc:creator>robertb</dc:creator>
      <description>
      There are lots of machine learning techniques which do have reasonable test run-times (the training run-times can be quite high for some, 'though).  I'm sure we'll hear about spam filters based on other techniques over the coming years, and hopefully not based on only words or even combinations of words.  (&lt;a href=&quot;http://razor.sourceforge.net/&quot;&gt;Razor&lt;/a&gt; may be going in this direction with its &lt;a href=&quot;http://lexx.shinn.net/cmeclax/nilsimsa.html&quot;&gt;fuzzy matching&lt;/a&gt; techniques (sucking up swaths of text rather than individual words).)
&lt;p&gt;
On a different subject, it's surprising that there's been no mention of &quot;white list keywords&quot;.  I think this can be an effective technique, particularly on the individual level.  (Eventually, using Bayesian techniques such as ifile, these may be able to be generated automatically and then pruned by the individual as necessary.)
&lt;p&gt;
&amp;lt;begin plug&amp;gt;&lt;br&gt;
See &lt;a href=&quot;http://www.csoft.net/~dummy/robert/software/procmail/junk&quot;&gt;this page&lt;/a&gt; for lots of spam fighting techniques/ideas.
&lt;br&gt;
&amp;lt;end plug&amp;gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9759/rss">
      <title>Sued ?!?</title>
      <link>http://lwn.net/Articles/9759/rss</link>
      <dc:date>2002-09-13T10:16:50+00:00</dc:date>
      <dc:creator>job</dc:creator>
      <description>
      It sounds like the problem is that you don't live in a free country, rather than being a problem with content filtering.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9738/rss">
      <title>Idea for increasing effectiveness</title>
      <link>http://lwn.net/Articles/9738/rss</link>
      <dc:date>2002-09-12T22:54:07+00:00</dc:date>
      <dc:creator>gswoods</dc:creator>
      <description>
      I am curious about the legal issues. I personally am not a lawyer, but&lt;br&gt;when I have taken tutorials at conferences on Internet legal issues, I&lt;br&gt;have been warned repeatedly about content filtering. SpamAssassin and the&lt;br&gt;Bayesian filters are content filtering, because they examine the content&lt;br&gt;of the message itself and filter based on that. This is fine for the end &lt;br&gt;user to do, but if you do it as an organization, you are potentially &lt;br&gt;opening yourself up to a big liability. Remember the Prodigy case? The&lt;br&gt;ruling there was essentially that, since they were doing content filtering,&lt;br&gt;they were liable for whatever *did* get through. So if you use SpamAssassin&lt;br&gt;on the organization's mail server, and one of your employees gets a kiddie&lt;br&gt;porn spam in spite of that and is offended by it, you could be sued.&lt;p&gt;We have started using IP blacklist filters here. This is safer from the&lt;br&gt;legal point of view, because the content of the message itself is never&lt;br&gt;examined. The message is rejected before it is ever sent. Our blacklist&lt;br&gt;filters have a lot of false negatives, but the problem with false positives&lt;br&gt;has been nearly nonexistent. Also, I used to get hundreds of bounced spams&lt;br&gt;every day, and the number has dropped to nearly zero since we started filtering.&lt;p&gt;I think IP blacklists still have their place. 
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9721/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/9721/rss</link>
      <dc:date>2002-09-12T19:07:49+00:00</dc:date>
      <dc:creator>bitbytebit</dc:creator>
      <description>
      that's actually &lt;a href=&quot;http://www.freshmeat.net/projects/blackhole/&quot;&gt;BlackHole on Freshmeat&lt;/a&gt;.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9720/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/9720/rss</link>
      <dc:date>2002-09-12T19:05:46+00:00</dc:date>
      <dc:creator>bitbytebit</dc:creator>
      <description>
      A program I have developed in C and uses SpamAssassin and many other spam blocking techniques combined (even can run SpamAssassin or BogoFilter from it) is BlackHole, available at &lt;a href=&quot;href=www.freshmeat.net/projects/blackhole/&quot;&gt;BlackHole on FreshMeat&lt;/a&gt;.

It also combines free virus checking or Sophos/Mcafee/Trendmicro checking, and many more techniques of filtering email, hopefully it can help too.

Thanks,
Chris
getdown@groovy.org
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9718/rss">
      <title>(statistically) biased tests?</title>
      <link>http://lwn.net/Articles/9718/rss</link>
      <dc:date>2002-09-12T18:26:51+00:00</dc:date>
      <dc:creator>corbet</dc:creator>
      <description>
      &lt;blockquote&gt;&quot;&lt;i&gt;A better test was maybe to train the filter with half of the data set and then test it with the other half.&lt;/i&gt;&quot;&lt;/blockquote&gt;
&lt;p&gt;
That was the first (15%) test, essentially.  And the linux-kernel test too. 
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9713/rss">
      <title>(statistically) biased tests?</title>
      <link>http://lwn.net/Articles/9713/rss</link>
      <dc:date>2002-09-12T17:49:13+00:00</dc:date>
      <dc:creator>bockman</dc:creator>
      <description>
      For the little I know of statistic filters, if you train a filter with a set
of data, then you should not use the same set of data to evaluate how good the trained filter is ( since you are testing on the training data, the
filter obviously shows good results ).
&lt;p&gt;
A better test was maybe to train the filter with half of the data set and then test it with the other half.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9696/rss">
      <title>spamc/spamd</title>
      <link>http://lwn.net/Articles/9696/rss</link>
      <dc:date>2002-09-12T14:28:41+00:00</dc:date>
      <dc:creator>corbet</dc:creator>
      <description>
      You know, mentioning (if not trying) the spamc/spamd pair was on my list 
as I put the article together, but somehow in the excitement of ordering
all those bulk email lists I dropped it.  It's really true that reading
your spam rots the brain.
&lt;p&gt;
I just ran the 5000-message linux-kernel test using spamd.  The filtering
results were the same, of course, and the run time dropped to 2400 seconds.  
That's a big speed improvement, but still an order of magnitude slower
than bogofilter.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9692/rss">
      <title>Here we go re-inventing the ego-wheel again</title>
      <link>http://lwn.net/Articles/9692/rss</link>
      <dc:date>2002-09-12T13:55:35+00:00</dc:date>
      <dc:creator>garym</dc:creator>
      <description>
      &lt;p&gt;Remember the Freshmeat editorial on how to pick an opensource project? The basic sarcastic rule was &quot;Pick something already done, and do it the same way.&quot; Bogofilter may not be mature (and the less said about buffer-overrun and similar failures in its sibling software the better), but &lt;a href=&quot;http://www.ai.mit.edu/~jrennie/ifile/&quot;&gt;iFile&lt;/a&gt; is not; its probably unwise to chastize a net god, but even if Eric has black kettle concerns that it's not &lt;i&gt;proper&lt;/i&gt;, I can't command him, but I'd rather he just &lt;em&gt;contribute&lt;/em&gt; to the pre-existing software, standing on shoulders instead of standing on the toes of others. iFile is all ready, coded and deployed, and contributor packages include scripts for folding in with procmail or any of about half a dozen email readers, and, of course, it's free software.&lt;/p&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9684/rss">
      <title>Another paper</title>
      <link>http://lwn.net/Articles/9684/rss</link>
      <dc:date>2002-09-12T13:17:53+00:00</dc:date>
      <dc:creator>jrennie</dc:creator>
      <description>
      &lt;p&gt;FYI, k-nearest neighbors (kNN) is very slow compared to filtering by rules or Bayesian approaches (like Graham describes, bogofilter and ifile).  For each message you want filtered, kNN compares that message to all messages in the training database.  So, filtering n messages is O(nm) (where m is # of training messages).  Bayesian approaches scale as O(n).&lt;/p&gt;

&lt;p&gt;
Jason Rennie&lt;br&gt;
Author of &lt;a href=&quot;http://www.ai.mit.edu/~jrennie/ifile/&quot;&gt;ifile&lt;/a&gt; - the original intelligent e-mail filter
&lt;/p&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9676/rss">
      <title>Another paper</title>
      <link>http://lwn.net/Articles/9676/rss</link>
      <dc:date>2002-09-12T12:21:28+00:00</dc:date>
      <dc:creator>armijn</dc:creator>
      <description>
      At the SANE 2002 conference in the Netherlands a paper was presented&lt;br&gt;about a self learning content based spam filter. According to it you&lt;br&gt;can get quite some good results with it:&lt;p&gt;http://www.nluug.nl/events/sane2002/papers.html&lt;p&gt;Instead of Bayesian learning it uses the k-nearest neighbours algorithm.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9648/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/9648/rss</link>
      <dc:date>2002-09-12T07:36:47+00:00</dc:date>
      <dc:creator>Dom2</dc:creator>
      <description>
      I do this and it makes spamassassin usable over my dialup.  At this point, most of my runtime costs are DNS lookups from spamd.&lt;p&gt;-Dom
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9646/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/9646/rss</link>
      <dc:date>2002-09-12T05:35:07+00:00</dc:date>
      <dc:creator>fcrozat</dc:creator>
      <description>
      Since SpamAssassin is written in perl, when you use it through procmail, a new perl intepreter is started for each message..  This is the main cause of the high &quot;run time&quot; figure.&lt;p&gt;To prevent this from happening, you should use the spamc/spamd tools which are shipped with SpamAssassin :&lt;p&gt;Spamd is a daemon which starts a spamassassin process which is kept in memory, fixing the startup latency problem of SpamAssassin.&lt;br&gt;Spamc is a client which connect to spamd and can be used instead of spamassassin in procmail rules (replace &quot;spamassassin -P&quot; with &quot;spamc&quot;).&lt;p&gt;You should try it, runtime will probably more reasonable.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9644/rss">
      <title>Spam avoidance techniques</title>
      <link>http://lwn.net/Articles/9644/rss</link>
      <dc:date>2002-09-12T04:12:33+00:00</dc:date>
      <dc:creator>dwheeler</dc:creator>
      <description>
      It's certainly reasonable to combine multiple anti-spam techniques,
in fact, a lot of people do exactly this.
&lt;p&gt;
Obviously, a lot of people only learned about this technique
from
&lt;a href=&quot;http://www.paulgraham.com/spam.html&quot;&gt;Paul Graham's plan for spam&lt;/a&gt; (a well-written piece!).
The LWN study shown here is wonderful confirmation that
it has value.
It's worth noting that there have been other studies
on the topic, including
&lt;a href=&quot;http://arxiv.org/abs/cs.CL/0006013&quot;&gt;An evaluation of Naive Bayesian anti-spam filtering&lt;/a&gt;,
&lt;a href=&quot;http://arxiv.org/abs/cs/0008019&quot;&gt;An Experimental Comparison of
Naive Bayesian and Keyword-Based Anti-Spam Filtering
with Personal E-mail Messages&lt;/a&gt;,
&lt;a href=&quot;http://arxiv.org/abs/cs/0009009&quot;&gt;Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach&lt;/a&gt;,
and information from
&lt;a href=&quot;http://www.lsi.upc.es/~carreras/pub/boospamev.ps&quot;&gt;lsi.upc.es&lt;/a&gt; and
&lt;a href=&quot;http://www.monmouth.edu/~drucker/SVM_spam_article_compete.PDF&quot;&gt;monmouth.edu&lt;/a&gt;;
&lt;a href=&quot;http://developers.slashdot.org/developers/02/08/16/1428238.shtml?tid=156&quot;&gt;Slashdot&lt;/a&gt; has carried a discussion about it.
&lt;a href=&quot;http://www.ai.mit.edu/~jrennie/ifile/&quot;&gt;Ifile implemented the idea&lt;/a&gt;
many years ago - it claims a first release date of
Aug  3 20:49:01 EDT 1996, and the author doesn't claim that this
program is the first implementation of the idea, either.
&lt;p&gt;
A selected set from the
newsgroup news.admin.net-abuse.sightings might be useful for initial
training of a spam filter.
That would eliminate the problem you mention.
&lt;p&gt;
I think every email reader should have a &quot;big SPAM button&quot;
that adds an email to the &quot;spam&quot; folder (so it can be used for
future analysis), as well as other configurable actions. See
&lt;a href=&quot;http://www.dwheeler.com/essays/stopspam.html&quot;&gt;
http://www.dwheeler.com/essays/stopspam.html&lt;/a&gt; for more
information about this.


      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/9641/rss">
      <title>Idea for increasing effectiveness</title>
      <link>http://lwn.net/Articles/9641/rss</link>
      <dc:date>2002-09-12T03:25:39+00:00</dc:date>
      <dc:creator>Strike</dc:creator>
      <description>
      Maybe I'm crazy, but I don't see why you can't simply daisy-chain the two together to provide even better results.  This way you can tweak SpamAssassin to a good enough target score that won't produce false positives (I've found that an aggregate score of 8 or so without changing any of the test scores does a fine job, though does miss a few), and then the mails that have gone to great enough length to assure that all the header tests, MX tests, subject tests, and content (such as MIME type) tests that SpamAssassin does don't pump up the score very high will be subject to the Bayesian approach as well.&lt;p&gt;This way, spam mails that are clever enough to pass one but not the other, will be tossed aside.
      
      </description>
    </item>
</rdf:RDF>

