LWN.net Logo

OT: Does usable free OCR software exist?

OT: Does usable free OCR software exist?

Posted Aug 24, 2006 17:27 UTC (Thu) by bronson (subscriber, #4806)
In reply to: OT: Does usable free OCR software exist? by debacle
Parent article: Fighting image spam

It totally depends on your source images. High resolution scans of clean, printed text tend to do pretty well. Blurry or misregistered scans (i.e. from a book without first tearing out the page) are trouble. If any letters bleed together, you're sunk.

I agree that Linux OCR is nascent but, even so, one typo per word is way too high. I would guess that your source images are flawed somehow?


(Log in to post comments)

OT: Does usable free OCR software exist?

Posted Aug 24, 2006 18:08 UTC (Thu) by debacle (subscriber, #7114) [Link]

You are right, the input was not perfect: Google scans of a mid 18th centery book. Still, I had hoped, that the OCR software would do better.

OT: Does usable free OCR software exist?

Posted Aug 25, 2006 19:25 UTC (Fri) by leoc (subscriber, #39773) [Link]

One thing you might want to try is to "posterize" the image you are scanning. I wrote a perl script to use gocr to scan my satellite television source for channels that I do not receive (they are the ones that come up with a screen of text that says something to that effect), and I found that I had to posterize the output to 2 colours (black text on white background) before gocr could read any text off them.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds