LWN.net Logo

SpamBayes 1.0 released

From:  "Tony Meyer" <ta-meyer-AT-ihug.co.nz>
To:  <spambayes-announce-AT-python.org>, <python-announce-AT-python.org>
Subject:  ANNOUNCE: SpamBayes release 1.0
Date:  Tue, 28 Sep 2004 19:46:28 +1200
Cc:  spambayes-AT-python.org, spambayes-dev-AT-python.org

The SpamBayes team is pleased to announce the 1.0 release of SpamBayes.

As is now usual, this is both a release of the source code and of an
installation program for all Microsoft Windows users.

The Windows installation program will install either the Outlook add-in (for
Microsoft Outlook users), or the SpamBayes server program (for all other
POP3 mail client users, including Microsoft Outlook Express). All Windows
users (including existing users of the Outlook add-in) are encouraged to use
the installation program.

If you wish to use the source-code version, you will also need to install
Python - see README.txt in the source tree for more information.

This release includes no changes from the successful (but now rather dated)
1.0rc2 release.  However, we still highly recommend that existing users
upgrade to the final version.  Work has already begun towards the first 1.1
release, and we expect to release a (bug fix only) 1.0.1 release around the
same time as 1.1a1.

September 2004 is Spambayes' 2nd birthday, and (as many users know) we have
gone through a very long release process, including 8 alpha releases, a
beta, and two release candidates, all tested by a large number of users.  As
such, we are very confident that this 1.0 release is stable and suitable for
regular use.  We do welcome any and all contributions for improvements, of
course!

Get it via the 'Download' page at

    http://spambayes.org/download.html

Enjoy the new release and your spam-free mailbox :-)

Thanks to everyone involved in this release, particularly Richie Hindle and
Kenny Pitt!

Tony.
(on behalf of the SpamBayes team)

--- What is SpamBayes? ---

The SpamBayes project is working on developing a Bayesian (of sorts)
anti-spam filter (in Python), initially based on the work of Paul Graham.
The major difference between this and other, similar, projects is the
emphasis on testing newer approaches to scoring messages.

The project includes a number of different applications, all using the same
core code, ranging from a plug-in for Microsoft Outlook, to a POP3 proxy, to
various command-line tools.

-- 
http://mail.python.org/mailman/listinfo/python-announce-list

        Support the Python Software Foundation:
        http://www.python.org/psf/donations.html


(Log in to post comments)

SpamBayes 1.0 released

Posted Sep 29, 2004 22:41 UTC (Wed) by rmstar (guest, #3672) [Link]

Neat.

...although I am having the impression that in terms of percentage, blocklists are doing a much better job in fighting spam than any of these statistical methods, simply because blacklists do not engage in mathematical/psychological games with spammers. As long as they have even a remote chance of getting their message through, they will try. And they are trying.

The current spam crisis might well be caused to some extent by the widespread use of statistical methods. Firstly, you still have to wade through all the spam to identify false positives, and if you recieve megabytes of spam, you will undoubtedly miss some. The only difference is that the sender will not get an error message. Secondly, just wading through all that spam essentially means that you are looking at it just the same. The fact that it is on its own folder does not make any difference.

And let's face it: none of these methods has a remote chance of actually groking the difference between spam and legit mail, whereas a good blocklist like that from spamhaus has a real human being looking at the evidence; undoubtedly a superior form of inteligence than any of these programs, be it spambayes or spamasassin etc. Don't get me wrong: I am not slighting the amazing work of the SpamBayes developers, just stating as a fact that this is not the real AI needed to actually tell the difference.

As an element in an anti-spam strategy, I'm sure SpamBayes can play a very important role, but only as a complement to a good blocklist.

Just my 2.15468 cents - rmstar

SpamBayes 1.0 released

Posted Sep 30, 2004 12:17 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

looking at the spam I receive I would conclude something just about opposite of what you do.

I would say that the attempts to use blocklists is causing more spam.

blocklists tend to be very fragile so if the spammer just keeps creating more varients of his messaes they tend to get through.

the statistical approaches look at more of the content and are not thrown off by minor changes to the matching strings

I would suggest that you actually try a statistical filter for a while before you say they have no chance of identifying spam

and let's face it, if it is absolutly impossible for any computer program to identify spam then it is lso impossible for any person to identify spam.

if a person can identify the difference between spam and their normal mail then it is possible for a computer program to do the smae thing. the question is just how to tell the computer to identify the message. statistical filtering isn't perfect, but neither is human spam filtering

SpamBayes 1.0 released

Posted Sep 30, 2004 16:46 UTC (Thu) by rmstar (guest, #3672) [Link]

blocklists tend to be very fragile so if the spammer just keeps creating more varients of his messaes they tend to get through.

I think you have a misconception about what blocklists do. They block IP adresses, they do not try to do substring matching.

You say that if a human can distinguish between spam and non-sam, so can a machine. That is wrong, at least with current technology. I sometimes get spam of the form "Hi sweethart - I'm longing to see you gain. Call me at <some phone number>". Whether such a mail is legit or not really depends on who is the recipient and who is the sender, and wether they know each other.

SpamBayes 1.0 released

Posted Sep 30, 2004 12:28 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

> Secondly, just wading through all that spam essentially means that you are
> looking at it just the same. The fact that it is on its own folder does not
> make any difference.

That's really not true at all... It makes a very large difference, in fact...
It's really rather easy to spot the one good message buried in a pile of
spam... Just as it's pretty easy to spot the one spam in a pile of good
messages... But, it's a MUCH harder task to sort spam from non-spam in a
completely mixed mailbox, with roughly equal amounts of each... Such a
mixed environment requires very close attention be paid to every single
message to make a determination... But, when dealing with a collection of
messages that are overwhelmingly of one particular variety, you can just go
into a much quicker scanning-for-anomolies mode... One good message in a
pile of spam really does stick out like a sore thumb... I've experienced
it on a few occassions... (Though, not that often, because despite your
implications, the bayesian filters I've used really don't make too many
false-positives... Very rarely, someone might send something full of HTML
or some other junk, which trips up the filter, but at least for the mail I
receive, it's been extremely rare...)

> none of these methods has a remote chance of actually groking the
> difference between spam and legit mail

They may not UNDERSTAND the difference, but they sure do seem to be able to
differentiate anyway, despite being simplistic and relatively stupid little
algorithms... Personally, I don't really care if my spam-filter has an
intelligent grasp on what spam is; as long as it successfully weeds out the
spam, and leaves the good messages, it can be as dumb as Dubya, and be
using astrology and numerology to make the determination, for all I care! ;-)
And, from MY experience, bayesian filter really DOES work remarkably well...
So, like I say, that's the important thing to me...

SpamBayes 1.0 released

Posted Oct 8, 2004 7:37 UTC (Fri) by kolloid (guest, #25282) [Link]

Blacklists are not effective. They cause too many false positives, because maintainers of those lists are too often overzealous. And ISPs that hosts spammers often has innocent users, too.

SpamBayes 1.0 released

Posted Sep 30, 2004 22:08 UTC (Thu) by gswoods (subscriber, #37) [Link]

I've been using a pre-release of SpamBayes, and it is far and away the best spam filter I've ever used. I'd say less than one or two spams a week actually makes it through SpamBayes misclassified or unsure.

We have blocklists here, and they help quite a bit. We reject hundreds of thousands of spams a month (and we only have 1200 users, not all of whom have blocklists turned on; so much for the old Postmaster's Credo). But for me personally, SpamBayes was a godsend. There would still be 100 or so spams a week that get through the blocklists.

I wrote my own script that will take one folder that has everything in it and another that has just the spam (occasionally I have to move a spam that got through into the spam folder by hand, otherwise it's just the spams that SpamBayes has already caught), removes the spams from the everything folder, and thus produces the ham folder, then runs the SpamBayes trainer. The results are fantastic. Except for the wasted bandwidth, spam is virtually a thing of the past for me now, and in the six months I've been using it, I have had exactly one false positive that anyone bothered to report.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds