Fighting image spam
A number of spammers have been evading filters like SpamAssassin (SA) recently by encoding their messages as images. SA already has a set of rules that are meant to combat image spam, but the more recent messages (typically for stock scams or pharmacy products) have been crafted to avoid them. This would indicate, once again, that spammers are using SA to pre-test their messages and are modifying them to get through. SA developers, however, are up to the challenge and two specific countermeasures have been released.
The first technique uses Optical Character Recognition (OCR) software to pull words out of the images and then uses a blacklist of words to increase the SA score. It was quickly realized that spammers are using similar obfuscation techniques in the images that they have long used in text emails (misspelling words, using characters that look like others, etc.) so a fuzzy matching was added to the plugin.
Unsurprisingly, there are already reports of images that put a light background of random 'snow' behind the text (example). This practice does not affect the readability for humans, but does affect the quality of the OCR output. The FuzzyOCR developers have quickly adapted by using a feature that removes smaller particles before doing the OCR scan. The question remains, of course, whether the OCR software will be able to keep up with obfuscations that will still be readable to humans. Human pattern matching may be too good for the state of the art in OCR.
The plugin uses several external programs from the netpbm tools, the gocr open source OCR program and several other libraries and perl modules. This is a fairly heavy handed approach, requiring a good bit of installation and configuration of the various pieces.
Another approach is the ImageInfo plugin, which does not require any external tools. It looks at the GIF and PNG headers of images in the email and calculates the area, in pixels, that they cover. Those values can be used in SA rules to increase the score of those having the characteristics of the latest image spam. The current ruleset penalizes single images that are larger than 180K pixels as well as a combinations of four or more images that total to more than 180K. It seems very likely that the spammers will be using the plugin and testing their images so this ruleset will likely have to evolve rather quickly.
It is interesting to watch the battle over our email inboxes as the level of cleverness of the spammers seems to be increasing over time. This is clearly an arms race and one that spam filtering developers will have to stay on top of for the foreseeable future. Long term solutions to the problem do not seem to exist and this incremental measure-countermeasure war is here to stay.
| Index entries for this article | |
|---|---|
| GuestArticles | Edge, Jake |
