August 23, 2006
This article was contributed by Jake Edge.
A number of spammers have been evading filters like
SpamAssassin (SA)
recently by encoding their messages as images. SA already has a
set of rules that are meant to combat image spam, but the more recent
messages (typically for stock scams or pharmacy products) have been crafted
to avoid them. This would indicate, once again, that spammers are using
SA to pre-test their messages and are modifying them to get through. SA
developers, however, are up to the challenge and two specific
countermeasures have been released.
The first technique uses Optical Character Recognition (OCR) software to
pull words out of the images and then uses a blacklist of words to
increase the SA score. It was quickly realized that spammers are using
similar obfuscation techniques in the images that they have long used in
text emails (misspelling words, using characters that look like others, etc.)
so a fuzzy matching was added to the
plugin.
Unsurprisingly, there are already reports of
images that put a light background of
random 'snow' behind the text (example).
This practice does not affect the readability for
humans, but does affect the quality of the OCR output. The
FuzzyOCR developers have quickly adapted by using a feature that removes
smaller particles before doing the OCR scan. The question remains, of course,
whether the OCR software will be able to keep up with obfuscations that
will still be readable to humans. Human pattern matching may be too good for
the state of the art in OCR.
The plugin uses several external programs from the
netpbm tools, the
gocr open source OCR program
and several other libraries and perl modules.
This is a fairly heavy handed approach, requiring a good bit of installation
and configuration of the various pieces.
Another approach is the
ImageInfo
plugin, which does not require any external tools. It looks at the GIF and PNG
headers of images in the email and calculates the area, in pixels, that they
cover. Those values can be used in SA rules to increase the score of those
having the characteristics of the latest image spam. The current ruleset
penalizes single images that are larger than 180K pixels as well as a
combinations of four or more images that total to more than 180K. It seems
very likely that the spammers will be using the plugin and testing their
images so this ruleset will likely have to evolve rather quickly.
It is interesting to watch the battle over our email inboxes as the level
of cleverness of the spammers seems to be increasing over time. This is
clearly an arms race and one that spam filtering developers will have to
stay on top of for the foreseeable future. Long term solutions to the problem
do not seem to exist and this incremental measure-countermeasure war is
here to stay.
(
Log in to post comments)