LWN.net Logo

mod_spambot

It has been known for years that spammers harvest web sites for email addresses to add to their lists. Various sites have responded by hiding or obfuscating email addresses found on their pages; some people go to extreme measures to keep their address from ever appearing on a page. One wonders what they are worried about; your editor only receives a mere 3-4000 spams per day to his highly-public email address, after all.

Suffice to say that without SpamAssassin LWN would likely have collapsed under the flood years ago.

Some folks have decided that it is time to take a more active stance against the harvesting of email addresses from web pages. The result is an Apache module called mod_spambot; version 0.47 was recently released. The idea behind this module is to detect accesses by address harvesters and shut them down. Unfortunately, the approach this module takes is too simplistic to work in many situations.

mod_spambot is essentially a traffic throttling module. If a given site pulls down too many pages in a given time period (default is 100 pages in one hour), its access is cut off. There is also a "honeypot" option which will, instead, feed the (presumed) harvester a set of pseudo-random pages with bogus email addresses in them. This approach may well cut off some spammers, but anybody who has maintained a busy web site can see a few problems fairly quickly:

  • This approach will also cut off others who may be grabbing large numbers of pages from the site. Search engines come to mind, as do archive sites or anybody wanting to mirror a portion of a site. Cutting off people who thoughtlessly run a recursive wget to grab an entire site has some appeal; "download the site" operations account for a substantial part of LWN's bandwidth usage. But most site operators do not want to pull the plug on search engines and the like. mod_spambot allows the administrator to construct a whitelist, but who wants to figure out how to whitelist every possible search engine of interest?

  • There are some very large networks out there hiding behind a massive router and a single IP address. Traffic which looks like it originates from a single host may, in fact, be generated by hundreds of individual readers.

  • Increasingly large amounts of traffic are generated by robots whose sole purpose is to get a referrer URL onto a "top referrers" page somewhere on the site. Purveyors of Internet gambling experiences and particular types of imagery appear to like this approach to marketing. The interesting thing is that these accesses come simultaneously from a large number of IP addresses. These people, clearly, are using a network of zombie machines for their attacks. Spammers already use zombies to deliver their mail; it is hard to believe that they would not use those machines for address harvesting as well.

So throttling robots based on IP address will miss some attackers while blocking legitimate users of the site. It would be nice to prevent one's web site from being used as a resource by spammers, but this approach is not, yet, the way to that end.


(Log in to post comments)

mod_spambot

Posted Aug 11, 2005 4:08 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

My company has several thousand employees, and all those in North America have their HTTP traffic routed through one gateway, with one IP address. The result is that any site that is reasonably popular with our employees, that uses this kind of throttling, shuts us out. One case I've run into is Slashdot's RSS feed, which often gives an insulting message saying to stop abusing it, because the mechanism is too stupid to tell that a hundred different people might be involved.

Why don't you put a proxy near the gateway?

Posted Aug 18, 2005 10:55 UTC (Thu) by hummassa (subscriber, #307) [Link]

(or fix your proxy server?)
Squid can be programmed to access some pages/addresses more infrequently
than others, and to dump cookies conditionally, AFAIK. So, all your
employees will have access to /. RSS, for instance -- and you can still go
out there only once each half-hour.
It will lower your bandwidth bill, too... or (if you have flat fee
bandwidth) improve the availability of bandwidth in your network
connection.

mod_spambot

Posted Aug 11, 2005 4:55 UTC (Thu) by port0 (guest, #6114) [Link]

Just a thought, but I've been wondering why someone doesn't take the reverse approach to this
problem. Rather than limit or shutdown a robot, why not start feeding it thousands of bogus email
addresses. Or feed the robot addresses which your mail server could use as a trigger to black hole
an IP address. It could be used to make the spammers system less efficient, or seed their lists with
information which could be used against them.

mod_spambot

Posted Aug 11, 2005 7:56 UTC (Thu) by lolando (subscriber, #7139) [Link]

> why not start feeding it thousands of bogus email addresses?

Because the Internet is already choking on spam, and adding more spam (even unrouteable spam) to it will only have negative effects? More wasted bandwidth, more wasted CPU usage (Spamassassin does use a bit of it), more wasted storage space (if only for the mail queues)... Everyone loses. Not even the spammers win, since the addresses are deliberately bogus.

project honeypot

Posted Aug 11, 2005 9:54 UTC (Thu) by copsewood (subscriber, #199) [Link]

> Just a thought, but I've been wondering why someone doesn't
> take the reverse approach to this problem.

Someone has. http://www.projecthoneypot.org/ feeds thousands of addresses to spambots. Messages sent to these addresses identify the addresses used by the spammers for harvesting, and also collect evidence usable for prosecution.

project honeypot

Posted Aug 11, 2005 12:59 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

I look towards the use of smart cards with keys, as currently rolled out within the US military, as the way ahead.
Anonoymity, while nice, might be too expensive in the face of spam.

Spam Solution

Posted Aug 11, 2005 19:16 UTC (Thu) by GreyWizard (guest, #1026) [Link]

Your post advocates a

(X) technical ( ) legislative ( ) market-based ( ) vigilante

approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(X) It will stop spam for two weeks and then we'll be stuck with it
(X) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
(X) Requires immediate total cooperation from everybody at once
(X) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business

Specifically, your plan fails to account for

( ) Laws expressly prohibiting it
(X) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
(X) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
(X) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook

and the following philosophical objections may also apply:

( ) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
(X) Countermeasures must work if phased in gradually
( ) Sending email should be free
(X) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough

Furthermore, this is what I think about you:

(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!

(Form taken from http://craphound.com/spamsolutions.txt)

Spam Solution

Posted Aug 18, 2005 21:40 UTC (Thu) by j1m+5n0w (guest, #20285) [Link]

I think that's overly pessimistic. I assume the grandparent post meant a solution in which all email is digitally signed by a smartcard using public key cryptography.

This does not require cooporation from everyone all at once. Those who don't want to use it can ignore the signature attached to messages, and they can still send mail to others (but with increasing risk of it getting automatically thrown out as spam as more people sign their email).

Public key distribution does not have to be centralized. Web-of-trust schemes can work as well (though I have yet to see a good implementation of this). If someone starts sending you spam, you can stop trusting whoever is linking you to them in the web of trust.

I fail to see how armies of worm riddled broadband-connected Windows boxes are a threat to this scheme.

Your objections are but to say "whatever we try will cause at least a minor inconvenience and doesn't solve every problem, so why bother doing anything".

Spam Solution

Posted Aug 18, 2005 22:03 UTC (Thu) by GreyWizard (guest, #1026) [Link]

This does not require cooporation from everyone all at once.

Wrong. So long as even one person that you might want to talk to does not use a cryptographic token to sign all email you must be prepared to examine your non-signed email which will include spam. Since that's exactly what you have to do now, such a system is useless for fighting spam until everyone adopts it -- in other words, it requires cooperation from everyone all at once.

Public key distribution does not have to be centralized.

No, but it usually is. Managing a web of trust is a complex task that technical people are able to do in some cases but the vast majority of computer users are not. Believe me I've tried. In any case the post I was responding to specifically mentioned the military where a web of trust is not generally the way things work.

I fail to see how armies of worm riddled broadband-connected Windows boxes are a threat to this scheme.

A worm riddled (broadband-connected is not as relevant but it was on the form) Windows box be directed to send spam signed with your cryptographic token whenever you are logged in. That makes the token a useless against spam, at least for anyone who wants to be able to communicate with users of insecure computer systems. That's more or less everyone.

Your objections are but to say "whatever we try will cause at least a minor inconvenience and doesn't solve every problem, so why bother doing anything".

Wrong. My objection points out that casually articulatued, wildly simplistic solutions to the spam problem are common as dirt. Credible attempts at a sound solution exist in many forms but what you and smitty_one_each propose is not among them.

Spam Solution

Posted Aug 19, 2005 3:37 UTC (Fri) by j1m+5n0w (guest, #20285) [Link]

So long as even one person that you might want to talk to does not use a cryptographic token to sign all email you must be prepared to examine your non-signed email which will include spam.

Requiring everyone who sends me mail to sign their messages is not the same as requiring everyone who sends email to anyone else on the internet to sign their messages. And even if some people don't sign their email, I would rather have a partial whitelist than none at all.

Managing a web of trust is a complex task that technical people are able to do in some cases but the vast majority of computer users are not. Believe me I've tried.

That's a fair objection. My belief is that the tools just aren't good enough, and they use ad-hoc trust metrics like "trust anyone reachable from me in three steps or less". Reputation systems are an area of current research, but some promising algorithms exist for this sort of thing.

A worm riddled (broadband-connected is not as relevant but it was on the form) Windows box be directed to send spam signed with your cryptographic token whenever you are logged in

That's an issue of lousy token design. Really, the things shouldn't sign anything without some form of user acknowledgement, like perhaps pushing a button on the smartcard. On the other hand, maybe this isn't really a problem - anyone who sends spam signed as themself is going to get on a lot of blacklists real fast. I'd be more concerned about bots creating new identities for themselves, and sending messages that appear to come from random people, but those messages should be given a low priority if they have no connection to the recipient's web of trust.

Really, I think this could work well if the correct infrastructure was in place. Unfortunately, smartcards are hard for the average person to obtain and use, we don't have a widely used public key architecture, and most people (including myself and everyone I know) don't bother to sign their mail, but I don't see these as insurmountable problems.

Spam Solution

Posted Aug 19, 2005 16:11 UTC (Fri) by GreyWizard (guest, #1026) [Link]

Requiring everyone who sends me mail to sign their messages is not the same as requiring everyone who sends email to anyone else on the internet to sign their messages.

Perhaps you only want to be able to receive messages from a small group of friends. Fine. Solving the spam problem is trivial for you. Choose a protocol other than SMTP or just invent one yourself. No cryptographic tokens are necessary. Everyone else will keep using SMTP because sometimes a stranger is not a spammer but a long lost friend or a prospective business partner. Such people can't reject mail from people without cryptographic tokens until just about everyone is using them.

Therefore such a scheme requires cooperation from everyone at once. Why didn't I say so in the first place? Oh, wait, I did.

And even if some people don't sign their email, I would rather have a partial whitelist than none at all.

Since when does a whitelist require cryptographic tokens? Free email services like hotmail already have such technology. Tokens seem to have the advantage of filling in the list for you but they don't because spammers can get tokens too. Any process for aquiring tokens that a typical email user would tolerate could be adopted by spammers for mass production of temporary accounts. You'll spend as just as much time sifting through messages that have a "low priority in your web of trust" as you for your inbox today, except you'll get to pay extra for the illusion of security.

Really, the things shouldn't sign anything without some form of user acknowledgement, like perhaps pushing a button on the smartcard.

No, sorry that just doesn't work. Let's pretend for a moment that people won't do what they always do which is to find some ingenious way of automatically pushing the button. This still won't work because the hostile code on the machine will wait until you try to send a legitimate email and then hijack your acknowledgement for a spam run. What you need is a small screen that shows the actual contents of the email you are preparing to sign, but that's beyond the scope of cryptographic tokens so you don't get to pretend that's what smitty_one_each was talking about.

Really, I think this could work well if the correct infrastructure was in place.

Go ahead and set that up then. When your project takes the world by storm I'll be sorry I dismissed you as a clueless crank. Until then forgive me if I remain skeptical.

mod_spambot

Posted Aug 11, 2005 13:07 UTC (Thu) by arcticwolf (guest, #8341) [Link]

http://www.monkeys.com/wpoison/

mod_spambot

Posted Aug 11, 2005 10:22 UTC (Thu) by Wummel (subscriber, #7591) [Link]

Another approach is to include non-visible (with CSS display:none) links to CGI pages with trap email addresses. A trap email address includes the client IP address and the current date (for example spamtrap-$ip+$date@myhost.com).

Advantages:
  • Regular users do not see the non-visible link and do not use it.
  • The trap emails can be fed to a script that automatically adds the client IP addresses to a blacklist. Plus they provide legal proof that the client host misused the site.
Disadvantages:
  • Some browsers might not understand the display:none CSS command. Anyway, the CGI page can contain a readable warning that users do not send any mail to the trap email address.

mod_spambot

Posted Aug 12, 2005 8:41 UTC (Fri) by oak (guest, #2786) [Link]

Hm... With a help of some server side scripting, the fake email addresses
could have the page requester's domain address.

I.e. the ones that harvest the IP-addresses, would get most of the spam
too. No doubt they would catch on fairly quickly, but I still like this
idea. :-)

mod_spambot

Posted Aug 15, 2005 20:41 UTC (Mon) by jmason (guest, #13586) [Link]

'Suffice to say that without SpamAssassin LWN would likely have collapsed under the flood years ago.'

Thanks for the thumbs up! I met PGN (moderator of the RISKS forum) a few weeks back at a conference and he said pretty much the same thing -- it's great to hear.

--j. (Apache SpamAssassin PMC)

mod_spambot

Posted Aug 20, 2005 23:57 UTC (Sat) by csamuel (✭ supporter ✭, #2624) [Link]

I'd like to add another extremely grateful thank you to everyone who has
contributed to SpamAssassin over the years, thanks for making email
bearable for me and my wife (she's an author, and was getting deluged)!

Chris

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds