LWN.net Logo

The MIT 2004 Spam Conference

January 21, 2004

This article was contributed by Joe 'Zonker' Brockmeier.

You know that spam prevention efforts have reached fever pitch when a spam conference brings together lawyers, developers, economists, Eric Raymond and a representative from Microsoft to discuss the problem and ways to stop it. MIT hosted a conference on this topic on January 16, and we decided to check out the webcast to see what kind of work is being done in this area. The answer is, there's quite a bit of work going on, and the future looks much more encouraging than you might think.

Lawyers Jon Praed and Matthew Prince both spoke about spam from the legal perspective. Praed discussed experiences in suing spammers. Interestingly, Praed wasn't as negative about the recent CAN-SPAM Act as many in the anti-spam community have been. Praed noted that legal solutions can often do something that technical solutions alone have failed to do: significantly drive up the cost of sending spam by requiring spammers to deal with legal bills. He also said that 2003 was a banner year for legal efforts against spam, because it brought the first arrests solely for spamming. According to Praed, the CAN-SPAM Act is effective, in that it makes it clear that spamming in and of itself is a crime.

Prince was less enthused with CAN-SPAM. Prince pointed out that 37 state spam laws have been passed prior to CAN-SPAM; now all 37 are pre-empted by federal law, which is weaker than most of the state laws. But even the stronger state laws have been largely ineffectual for stopping spam. He also noted that spam laws were not based on the volume of spam, which is the problem we now face, but were written to counter the problem of fraud in spam.

Prince did bring up the McCain amendment to CAN-SPAM for praise, and said it had received almost no coverage. Essentially, the McCain amendment says that when prosecutors are going after a spammer, they don't necessarily have to go after the sender. It allows prosecutors to attach liability to advertisers, which may be much more effective than having to go after the spammer.

Prince also said that we would have to remove anonymity of email to solve the legal problem of spam. Washington has been the most successful because its law includes a registry of email addresses that are located in the state of Washington. He said that it was necessary to establish a national do-not-spam registry which would establish jurisdiction to allow spammers to be sued and prosecuted.

Both Prince and Praed agreed that the important thing about legal solutions is that they impose costs on spammers.

Yahoo's Miles Libbey talked about trends in spam, as seen passing through Yahoo Mail. Like many other speakers, Libbey saw a emerging emphasis on spammers trying to hide their identity, and attempting to make messages more random to avoid filters. On a scary note, Libbey said that Yahoo! had found that spammers had reacted to their anti-spam filters within a space of two hours.

Another presentation focused on finding economic means to deal with spam. Thede Loder, Marshall Van Alstyne, and Rick Wash outlined the Attention Bond Mechanism (ABM) where senders would have to put up a "bond" where users could charge the sender a sum of money for unwanted messages or release the money if the message was wanted.

Assuming a working model could be found and implemented, they say this would be of benefit to users and marketers. According to Loder, Van Alstyne and Wash, it could be cheaper than direct mail, while giving the recipient an incentive not to block the email automatically. Either the message would be of benefit to the user, or the user could reap a small financial gain by accepting the message. Most importantly, this model would return the control of a user's inbox to the user where it belongs and shift the burden to marketers.

Along the same lines, Eric Johansson of CAMRAM talked about a hybrid system that would add a money-free sender-pays type of system incrementally to email. Instead of being a money-based system, the stamp creation would be time-based. That is to say, that each "stamped" email would contain a 22-bit or 23-bit stamp that costs a given amount of time to generate. Adding that amount of time to generate each email would be somewhat prohibitive for spammers, as spammers need to send email in volume to make money.

Of course, there were also many discussions of technical means to filter and block spam. William Yerazunis spoke about ways to go beyond the accuracy of Bayesian and Markovian spam filtering. One interesting note from Yerazunis' talk is that he noted that some spammers are getting desperate enough to actually sign up for "well-credentialed" email lists in an effort to penetrate those lists and send spam to the mailing list members. He also noted that the "Habeas Haiku" method of whitelisting mail has actually become an indicator of spam rather than an indicator that the email is clean, as spammers have been brazenly using the Haikus in their spam.

Marty Lamb spoke about Martian Software's TarProxy, or "creating pain for spammers." TarProxy is a method for throttling connections between the spammer and an SMTP server by slowing the rate at which a spammer can send spam, and thereby make it more costly. It also would cause headaches for administrators of open relays, with the eventual goal of forcing them to fix the configuration of their server.

Jonathan Zdziarski managed to present two topics in the allotted 20 minute space. Zdziarski spoke about using "chained tokens" to provide more information when filtering spam, rather than using a single word as a token. The "chained token" technique basically works on the concept that it is easier and less risky to identify spam by multiple words or tokens rather than a single word or token. Tokens can include mail headers, HTML fragments and other bits of an e-mail. A white paper discussing the technique can be found on the DSPAM website in PDF.

Zdziarski is also working with Bill Yerazunis on an RFC for MIME Encoding for message inoculation, create a message format that allows different spam filters on different servers to share inoculation information.

John Graham-Cumming taped his presentation beforehand. Instead of discussing how to block spam, Graham-Cumming's presentation focused on how spammers could beat spam filters by using filters like POPFile to detect "good" words to get through a spam filter. Graham-Cumming predicts that spammers will continue to react to adaptive filtering, and said that it would be possible for a spammer to insert "web bugs" into spam to help train filters to see which messages are delivered and which are not. Graham-Cumming said that it would be necessary to choke off feedback to spammers, such as bounces and SMTP error messages, to prevent adaptive filtering to work against spam filtering.

Eric Raymond was also on hand at the conference, and spoke about several topics. One topic Raymond discussed is a provision in the CAN-SPAM Act that requires the Department of Commerce to consult with the IETF on spam-labeling standards. While the CAN-SPAM Act directs the department to consult with the IETF on this issue, the IETF does not have any labeling standards at the moment. Raymond says he is working on a draft RFC that could "pass constitutional muster" that could be used.

Raymond also discussed Sender Permitted From (SPF). SPF allows a server to query whether something is a valid IP address, and to set policies based on that information. To use SPF, you add information to DNS that informs the world which IP addresses are valid for sending e-mail from your domain. When spammers attempt to spoof "from" headers and so on, a server using SPF can check to see whether or not the IP addresses match the valid IP addresses listed in DNS records.

Raymond admitted that there are compatibility problems with SPF. For example, SPF breaks forwarding and causes problems for roving users who need to send mail from different IP addresses. He noted that no one technology for stopping spam is perfect, but several tactics can work together as a "drug cocktail" to help end the spam problem.

For those interested in attending an anti-spam conference before MIT's 2005 conference, several speakers plugged the First Conference on Email and Anti-Spam (CEAS), which is scheduled for July 30 and 31 in Mountain View, California. For those working on anti-spam technologies or in related areas, there is a call for papers with a deadline of April 16.

The full presentations from the MIT conference are available in RealPlayer format at the Spam Conference website.


(Log in to post comments)

The MIT 2004 Spam Conference

Posted Jan 22, 2004 15:02 UTC (Thu) by farnz (guest, #17727) [Link]

I noticed very few people discussing the possible gains from using OpenPGP as a way to limit e-mail spam. Simply put, signing messages with a valid signature is non-trivial; forging a signature is even harder. At this point, any unsigned mail is suspicious; any mail that's signed by someone I trust (because I trust the signer) or encrypted to me (which is an operation per recipient) is definitely not spam. Any mail signed by one of the listed keys is definitely spam. Any mail whose signature is from an unrecognised key is suspicious.

It doesn't take much to distribute a list of keys known to have been used by spammers, since keys are small (typically a few kilobits), and can fit into a DNS-based RBL. The only way round it is to somehow obtain a trusted key (which is likely to be hard, since a key is only trusted if I have said I trust it, or enough people whose keys I trust highly have said it's trusted), or to encrypt messages to the recipient, which is an operation per recipient, and drives up the cost of spam considerably.

Of course, this system has a major problem (probably insurmountable, as with most of these technical/social problems): how do we get all users to use OpenPGP?

The MIT 2004 Spam Conference

Posted Jan 22, 2004 15:42 UTC (Thu) by trutkin (guest, #3919) [Link]

That sounds pretty similar to the "stamp" idea that was mentioned in the conference where
the time needed to generate a stamp is prohibative to mass mailings.

The MIT 2004 Spam Conference

Posted Jan 22, 2004 19:16 UTC (Thu) by farnz (guest, #17727) [Link]

You're right, it is similar. The difference is that most of the infrastructure needed is already available. Mozilla Mail has the EnigMail extension. KMail handles OpenPGP messages. Evolution handles OpenPGP. There are plugins available for Outlook and Outlook Express. "All" (but it's a big all) that's needed is users to switch to signing all their mail, and encrypting where possible, and changing MTAs to look at PGP signatures is worthwhile.

Granted, this is very much a Final Ultimate Solution to the Spam Problem, but it brings other side benefits as well as solving spam (it solves the problem of proving someone sent an e-mail, and it solves any issues with e-mail security.

The MIT 2004 Spam Conference

Posted Jan 29, 2004 12:34 UTC (Thu) by AdamInPoland (guest, #19036) [Link]


The other difference is that the stamp system can be setup fairly tranparently by sysadmins, thus overcoming the social barriers to implementing OpenPGP. It doesn't have to be obligatory, but stamped mail could start out by being another factor that a filtering system looks at. That way, it can very quickly become a standard.

Obviously getting everyone to use PGP is a better solution, but while we're waiting for a solution to that one, stamping to me seems like a big part of the answer.

The MIT 2004 Spam Conference

Posted Jan 29, 2004 12:40 UTC (Thu) by esjatharvee (guest, #19038) [Link]

one of the things I hope to accomplish within the framework of the camram project is an opportunistic signature system. Two parties introduce themselves using proof of work stamps, then continue using signatures on e-mail as proof of identity. Opportunistic signatures increases the barriers against spammers ability to forge but do not create a centralized identity system which can be used for censorship or control.

For more information, take a look at www.camram.org.

---eric

The MIT 2004 Spam Conference

Posted Jan 30, 2004 20:40 UTC (Fri) by jmason (guest, #13586) [Link]

farnz -- it has occurred to people before, as have other public-key-based auth methods.

The problem is key distribution. Without solving that, it'll help a small number of techies correspond with their existing groups of friends -- and that problem is already solved, for example with SA's autowhitelist. It's the mail from someone *new*, or from some automated mail-generating app, that's the problem.

(BTW we have a bug open in SpamAssassin's bugzilla to implement opportunistic PGP keychecking using the user's keyring anyway, if you fancy helping out ;)

The MIT 2004 Spam Conference

Posted Jan 22, 2004 15:29 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

> how spammers could beat spam filters by using filters like POPFile to
> detect "good" words to get through a spam filter.

That really won't work very well... Because, the whole point with Bayesian
filtering (and, why it requires personal training) is that everyone's own
"good" words are going to be different... So, no spammer is going to be
able to magically determine what everyone's "good" words are... Sure, they
might manage to determine some that have a high probability of being in most
people's "good" list, but at the same time, they ALSO need to use some "bad"
words in order to pitch whatever the hell it is they're selling, too... So,
their messages are unlikely to appear as truly "good" messages, no matter
WHAT they do, short of silencing their own pitch entirely, and actually
forging a legit-looking message of some sort...

I, for one, haven't had much of any problem with the recent trend of spammers
inserting various random "good" words into their messages... Both SpamBayes
and BogoFilter seem to handle it just fine, and have no trouble seeing
through their trickery... It's only very rarely that a particularly
clever/lucky spam will make it through, these days... (The other day, I
got one loaded with tons of techy computer-related terms, which slipped
through with a score of 0.00! Impressive... But, one run through the
retraining, and it scored as 1.00... There were enough non-good words in
it to key off of for proper identification, I guess... I suspect it was
the gratuitous misspellings that were all over the place... Hell, one
could almost construct a reliable spam-filter using nothing more than
ispell/aspell, given spammers' seeming love of deliberately misspelling
everything... ;-))

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds