Open spam filtering rules considered harmful?
[Posted October 13, 2003 by corbet]
Readers of LWN know that we have long been a fan of
SpamAssassin. Your editor, whose
personal spam load is approaching 500 messages per day, would long have
ceased to function without it. Network life in the 21st century requires
either a well-hidden email address, or some sort of effective filtering.
SpamAssassin's extensive arsenal of tests has
traditionally included checks for legitimate mail. In the past, mail which
identified itself as having been created with certain free email agents or
which contained a software patch was given some extra credit in the scoring
process. Spammers have often found and exploited those tests; for a while,
some of us were receiving mail which had been simultaneously "created" with
mutt and evolution. The usual response to such activity has been to remove
the tests in question.
Most recently, some spammers have started adding fake PGP signatures (in
full HTML glory) to their output, in the hopes of slipping past
SpamAssassin. The PGP signature test was removed some time ago, but the
exploit was still enough to inspire this News.com article
which, among other things, says:
The attack on the software's filtering process highlights the
dangers of open-source projects, but it also reinforces the ability
of projects with active development teams to quickly respond to
such security holes.
The open nature of SpamAssassin's filtering is, thus, a "danger." Lest one
become too concerned about the "dangers" involved in using SpamAssassin,
however, there are a few things which should be kept in mind:
- Prospective spam can be tested against any filter, open or closed.
It would be surprising if spammers were not trying their products
against SpamAssassin in this way. They also, most likely, maintain
accounts with large ISPs and try to craft messages that get past the
filters those ISPs employ as well.
- SpamAssassin remains highly effective, even when spammers have had
plenty of time to study its tests and work out ways to get around it.
Open or not, SpamAssassin's rules are very good at identifying spam,
and they appear to be hard to get around. Fighting spam is an arms
race; it is surprising, actually, how rarely one has to upgrade
SpamAssassin to keep it effective.
- The bayesian filtering techniques used by SpamAssassin (and many other
spam filtering systems) cannot be worked around in any easy way. A
quick test on about 6400 messages which had accumulated in your
editor's spam folder shows that the bayesian filter is the decisive
test which condemns 15-25% of all incoming spam. Bayesian filters are
highly individualized, and they are inaccessible to spammers. The
algorithm is entirely open, but that is little comfort to those who
would bury us in unwanted trash.
The real lesson from the PGP signature "exploit," most likely, is that
negative tests will always be relatively easy for spammers to abuse. That
will be why SpamAssassin 2.60 contains almost none of these tests.
The most important point, however, is entirely different. For many of us,
email is a vital connection to the world. It is natural to be concerned
about trusting a program to filter our incoming mail for us; mistakes can
have real consequences. Would you really want to trust your mail to a
hidden, proprietary filtering scheme? Don't you want to know what
assumptions and biases have gone into the filtering decisions? Or, at
least, don't you want that information to be available to those with the
time and interest to check it out?
Allowing a black box to pass
judgment on one's incoming mail stream poses more dangers than an open,
free system ever could.
(
Log in to post comments)