LWN.net Logo

Breaking CAPTCHA

By Jake Edge
March 19, 2008

Perhaps someday it will be considered discrimination against a sentient, but these days a way to distinguish between programs and humans is required for many web-based applications. Keeping spambots from posting comments in weblogs or other bots from signing up for a web service are two of the most common applications for separating humans and bots. As has often been the case in the past, though, when the stakes are high enough, attackers will find ways to circumvent barriers like this.

The most common means of testing for humans in web site sign-ups and the like is a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). Typically these are images that contain some text that has been mangled so that it is still recognizable by humans, but not by programs—at least that is the theory. Variations on the theme include asking math or "common sense" questions that programs will supposedly not be able to figure out—more likely no attacker has had enough interest breaking them. Serious CAPTCHAs tend to use images that can be created on the fly, giving nearly infinite variety.

Some of the most sophisticated CAPTCHAs are those used by various free web mail services: Hotmail, Yahoo, and Gmail. These services provide quite a bit of storage that might be of use to an attacker, but they also lend their reputation to mail that gets sent from those accounts. Domains like yahoo.com and gmail.com are very unlikely to be blacklisted. Mail coming from those domains may also score lower in various spam testing rules, which may be exactly what an attacker is looking for.

Various techniques have been tried in the past to circumvent CAPTCHAs, with the most successful ones using humans. It seems that many folks will happily solve CAPTCHAs in order to view pornography or for cash. Over the last year, though, CAPTCHA-breaking programs have started to appear.

In a very detailed report, Websense presents evidence that Gmail's CAPTCHA has been cracked. Earlier reports indicate that attackers have cracked Yahoo, Windows Live, and Hotmail CAPTCHAs as well. Cracked does not mean 100% success rate—humans cannot even achieve that—it just needs to work often enough to provide the attackers with the accounts they want.

These programs use some image processing and optical character recognition (OCR) techniques to decipher the puzzle, removing humans from the equation entirely. Typical success rates are in the 20-35% range. For attackers with botnets available to spread out the work, this could yield an amazing number of accounts in relatively short order.

CAPTCHAs have a number of bad characteristics: they are annoying to most and unusable by those who are visually impaired. Yet they are pervasive. Alternate techniques using audio have so far been found wanting; a more interesting method is Asirra from Microsoft Research.

Asirra uses 3 million images of dogs and cats from animal shelters that have been categorized. The test then shows a dozen random images from the database and asks the "human" to select all the cat photos. This would seem much more difficult for a program to handle. The picture database would need regular updates to thwart attackers just collecting all the images and doing their own categorization—perhaps with help from porn viewers or poor folk. Also, computer recognition systems will someday be able to recognize dogs and cats.

It is a difficult problem to solve, but one that needs to be addressed. Systems like OpenID are not enough—it is not what they were designed for—as there is nothing stopping bots from having OpenIDs. Some mechanism that would allow reputation or trust to accumulate on a given ID might help prove that its holder is a human—or at least a well-behaved bot. Designing a reputation service that is decentralized will also be difficult, but it is the right direction for solving these kinds of problems.


(Log in to post comments)

Breaking CAPTCHA

Posted Mar 20, 2008 7:49 UTC (Thu) by eludias (subscriber, #4058) [Link]

The 3 million probably doesn't need to much updating since one can apply captcha-techniques
also to those images: flip, rotate, stretch and apply other transformation which will leave
the cats and dogs intact enough to be recognizable.

PWNtcha

Posted Mar 20, 2008 12:21 UTC (Thu) by patrick_g (subscriber, #44470) [Link]

Did you know PWNtcha ?
It's a CAPTCHA decoder written by Sam Hocevar (actual Debian project leader).
It seems that there are a lot of deficient CAPTCHA implementation out there.

PWNtcha

Posted Mar 27, 2008 8:31 UTC (Thu) by gvy (guest, #11981) [Link]

Somehow this adds to his WTFPL, and distracts from such a leadership... at least for me.

Breaking CAPTCHA

Posted Mar 20, 2008 13:46 UTC (Thu) by gypsumfantastic (guest, #31134) [Link]

Updated vulnerabilities? I thought they'd been shown the door. 

And here was I preparing for a PG DN-frenzy-free LWN security page. Sad panda.

Updated vulnerabilities

Posted Mar 20, 2008 13:50 UTC (Thu) by corbet (editor, #1) [Link]

We asked the question - but remember that only today can the non-subscribers comment on it. In any case, we never said what we were actually going to do... The section will almost certainly be going away, but it may take another week or two to get the work done.

Breaking CAPTCHA

Posted Mar 20, 2008 14:56 UTC (Thu) by creyes123 (guest, #49450) [Link]

"—perhaps with help from porn viewers or poor folk."


Maybe it's just me, but I got a huge laugh out of this line.

Breaking CAPTCHA

Posted Mar 20, 2008 19:50 UTC (Thu) by JLCdjinn (guest, #1905) [Link]

It's not just you.  I'll bet my colleagues think I'm strange, laughing heartily in the middle
of the cube farm.

Cats @ Asirra

Posted Mar 20, 2008 20:41 UTC (Thu) by MrWim (subscriber, #47432) [Link]

Based on the last 30 minutes I've spent looking at the cats on the Asirra I don't think the
spammers will need to provide any extra motivation to get people to solve them.

Cats @ Asirra

Posted Mar 20, 2008 21:32 UTC (Thu) by bronson (subscriber, #4806) [Link]

That's an interesting idea...  maybe someone should combine captcha with hotornot.  "Click on
the 3 hottest people in these 8 pictures".  See, recognizing cats vs. dogs shouldn't be too
hard...  you could probably get pretty far with simple contrast / SIFT tricks.  But good luck
building a an adequate corpus to allow a computer to determine hotness!

This might also inspire people to have really long passwords.

Cats @ Asirra

Posted Mar 20, 2008 21:38 UTC (Thu) by Felix.Braun (subscriber, #3032) [Link]

I've been utterly unable to get past the hotornot test :-( And I assure you, I am perfectly
human. ... Or at least I thought so. Time to re-take my Voight-Kampff test ;-)

hotornot strategy

Posted Mar 20, 2008 22:40 UTC (Thu) by dark (subscriber, #8483) [Link]

It's easy. Choose 'female' and select the three pictures with the most skin showing; those will be the "hot" ones. A skin tone detector should pass it without difficulty :)

hotornot strategy

Posted Mar 20, 2008 23:03 UTC (Thu) by zlynx (subscriber, #2285) [Link]

No, no, that strategy will get you the very overweight females in bikinis.  

NOT hot!

Breaking CAPTCHA

Posted Mar 21, 2008 2:39 UTC (Fri) by lipak (guest, #43911) [Link]

> Various techniques have been tried in the past to circumvent
> CAPTCHAs, with the most successful ones using humans. 

I was surprised to see that the article did not explicitly link to
the AI approaches to breaking CAPTCHAs which are listed at the end of
http://www.captcha.net/. Some of these papers are quite old and date
back to the time when CAPTCHAs first started appearing on the 'net
about 5 years ago.

It may just be that computers have become faster over the past five
years *and* the programs have had time to gather enough data to enable
the above approaches to increase their success rate.

Kapil.
--


Funny/scary CAPTCHA use

Posted Mar 21, 2008 18:41 UTC (Fri) by Max.Hyre (subscriber, #1054) [Link]

Take a look at CERT's comment page. We're supposed to look to them for security? And, yes, I pointed it out to them.

Breaking CAPTCHA

Posted Mar 21, 2008 21:47 UTC (Fri) by job (guest, #670) [Link]

I never understood the hype behind these. I mean, there are plenty of things that tells humans
and computers apart, but distinguishing twisted characters isn't one of them. Maybe that's
just me, but I frequently get them wrong and wish for a firefox plugin that could solve them
for me :/ .

Breaking CAPTCHA

Posted Mar 22, 2008 2:25 UTC (Sat) by dvdeug (subscriber, #10998) [Link]

One of the things that tells humans and computers apart /is/ distinguishing characters. Basic
vision is one of the issues that drives modern AI research. It was several years old, but one
test of OCR programs at UNLV showed that the best OCR program was inferior to a six year old
in correctly recognizing text. Currently in transcribing etexts, Distributed Proofreaders
(which transcribes for Project Gutenberg) does a opening level of OCR, but there's still a lot
of cleanup for random volunteers to do.

Breaking CAPTCHA

Posted Mar 25, 2008 5:56 UTC (Tue) by Max.Hyre (subscriber, #1054) [Link]

Maybe that's just me, but I frequently get them wrong [....]
Nope, it's not just you. I just blew at least five (six? seven?) attempts at recognizing Yahoo CAPTCHAS. I was so bad that they locked me out for 24 hours, because they think I'm trying to break in. :-/

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds