By Jake Edge
March 19, 2008
Perhaps someday it will be considered discrimination against a sentient,
but these days a way to distinguish between programs and humans is required
for many web-based applications. Keeping spambots from posting comments in
weblogs or other bots from signing up for a web service are two of the most
common applications for separating humans and bots. As has often been the
case in the past, though, when the stakes are high enough, attackers will
find ways to circumvent barriers like this.
The most common means of testing for humans in web site sign-ups and the
like is a CAPTCHA
(Completely Automated Public Turing test to tell Computers and Humans
Apart). Typically these are images that contain some text that has been
mangled so that it is still recognizable by humans, but not by
programs—at least that is the theory. Variations on the theme
include asking math or "common sense" questions that programs
will supposedly not be able to figure out—more likely no
attacker has had enough interest breaking them. Serious CAPTCHAs
tend to use images that can be created on the fly, giving nearly infinite
variety.
Some of the most sophisticated CAPTCHAs are those used by various free web
mail services: Hotmail, Yahoo, and Gmail. These services provide quite a
bit of storage that might be of use to an attacker, but they also lend
their reputation to mail that gets sent from those accounts. Domains like
yahoo.com and gmail.com are very unlikely to be blacklisted. Mail coming
from those domains may also score lower in various spam testing rules,
which may be exactly what an attacker is looking for.
Various techniques have been tried in the past to circumvent CAPTCHAs, with
the most successful ones using humans. It seems that many folks will
happily solve
CAPTCHAs in order to view pornography or for cash.
Over the last year, though, CAPTCHA-breaking programs have started to appear.
In a very
detailed report, Websense presents evidence that Gmail's CAPTCHA has
been cracked. Earlier reports indicate that attackers have cracked
Yahoo, Windows Live, and Hotmail CAPTCHAs as well. Cracked does not mean
100% success rate—humans cannot even achieve that—it just needs
to work often enough to provide the attackers with the accounts they want.
These programs use some image processing and optical character recognition
(OCR) techniques to decipher the puzzle, removing humans from the equation
entirely. Typical success rates are in the 20-35% range. For attackers
with botnets available to spread out the work, this could yield an amazing
number of accounts in relatively short order.
CAPTCHAs have a number of bad characteristics: they are annoying to most
and unusable by those who are visually impaired. Yet they are pervasive.
Alternate techniques using audio have so far been found wanting; a more
interesting method is Asirra from Microsoft
Research.
Asirra uses 3 million images of dogs and cats from animal shelters that
have been categorized. The test then shows a dozen random
images from the database and asks the "human" to select all the cat
photos. This would seem much more difficult for a program to handle. The
picture database would need regular updates to thwart attackers just
collecting all the images and doing their own categorization—perhaps with
help from porn viewers or poor folk. Also,
computer recognition systems will someday be able to recognize dogs and cats.
It is a difficult problem to solve, but one that needs to be addressed.
Systems like OpenID are not
enough—it is not what they were designed for—as there is nothing stopping bots from having
OpenIDs. Some mechanism that would allow reputation or trust to accumulate on a
given ID might help prove that its holder is a human—or at least a
well-behaved bot. Designing a reputation service that is decentralized will also be difficult, but it is the right direction for
solving these kinds of problems.
(
Log in to post comments)