New tool screens spam, digitizes books (ZDNet)
A group of Carnegie Mellon University programmers has launched a service called ReCaptcha that can help cut down on spam while letting people digitize books. The project is a variation of the widely used "Captcha" technique to weed out computer abuse such as e-mailing spam or posting spam on blog comments. Captchas require users to pass little pattern recognition tests, commonly reading distorted or obscured words."
Posted May 25, 2007 19:29 UTC (Fri)
by dlang (guest, #313)
[Link] (2 responses)
when digitizing text you don't know what the real text is ahead of time so how in the world can you tell if you got the right answer? and once you have the right answer it's no longer a benifit for digitizing books.
Posted May 25, 2007 19:42 UTC (Fri)
by stephen_pollei (guest, #23348)
[Link]
So you take a 100 page book and you scan it in. You do one paragraph from about every 10 pages. Then you give the "human" three paragraphs worth of text. The sample should be big enough to hopefully contain 5 errors or more.
You are really testing if they found the errors you know about but they also report on a few errors you didn't know about before.
That would be my guess, I'll read the fine article and see if I was correct.
Posted May 25, 2007 21:27 UTC (Fri)
by khim (subscriber, #9252)
[Link]
The service presents users with two words, one from a conventional Captcha test and the other an unknown word that a computerized optical character recognition couldn't figure out. If the user correctly identifies the known word, he or she is presumed to have decoded the unknown one. Simple idea, don't know why other projects are not doing it...
Posted May 27, 2007 22:56 UTC (Sun)
by Tobu (subscriber, #24111)
[Link]
They are hosting the captchas on their own website, pinged whenever someone writes a comment on a recaptcha-enabled blog.
They certainly need to be clear that they won't put any tracking cookies or feed an indexing bot with this.
Posted May 28, 2007 4:51 UTC (Mon)
by jimmybgood (guest, #26142)
[Link] (1 responses)
Of course, websites could identify one as optional, but I bet most folks won't bother, particularly if it's difficult.
It sounds more like a tax than a research project. 150,000 free hours a day? Why not just recruit some people to try to improve OCR logic?
Posted May 28, 2007 19:24 UTC (Mon)
by Los__D (guest, #15263)
[Link]
I know noone (besides you, whom I don't know except for your general negativity of most things) that are annoyed by them.
Captchas work by presenting text that's hard to machine read and seeing if the response matches the known textsomething doesn't make sense here.
Maybe you overlap the testing area with text that you know and text that you don't know.something doesn't make sense here.
something doesn't make sense here.
I don't see their privacy policy.Privacy
People are already annoyed by having to solve captchas. I'm no different than most people and if all of a sudden I have to solve two, I'm going to find a place where I only need to solve one.Not gonna fly
"People are already annoyed by having to solve captchas. I'm no different than most people and if all of a sudden I have to solve two, I'm going to find a place where I only need to solve one."Not gonna fly
Then you go do that.
